Phase 2: HGNN-MoE - Evolution to Higher-Order Expert Interactions

While Phase 1's GNN-MoE proved the power of dense expert collaboration, it revealed critical limitations: VRAM constraints prevented scaling beyond 8 experts, and traditional graph edges could only model pairwise relationships. Phase 2 addressed both challenges by evolving from Graph Neural Networks to Hypergraph Neural Networks (HGNNs).

The key insight: What if experts could form coalitions and communicate in groups, not just pairs?

From Edges to Hyperedges

Traditional GNNs connect experts through edges that link exactly two nodes. But real-world collaboration often involves multiple participants simultaneously. HGNNs use hyperedges that can connect any number of experts, enabling:

Multi-expert coalitions: Groups of 3, 4, or more experts can collaborate directly
Higher-order relationships: Capture complex dependencies that can't be expressed through pairwise connections
Improved memory efficiency: Potentially better scaling characteristics than dense adjacency matrices

Think of it as evolving from a series of one-on-one conversations to enabling group discussions where multiple experts can contribute simultaneously to complex reasoning tasks.

Technical Architecture

The HGNN coupler replaces the traditional GNN with PyTorch Geometric's `HypergraphConv` layers, implementing static hyperedge strategies:

"All-Pairs" Strategy: Creates hyperedges connecting every pair of experts, similar to GNN edges but with more flexible representations.

"All-Triplets" Strategy: Forms hyperedges connecting every group of three experts, enabling genuine higher-order interactions that capture emergent behaviors from expert triplets.

Learnable Hyperedge Weights: Each hyperedge can have learned importance weights, allowing the model to discover which expert coalitions are most valuable for different types of reasoning.

Comprehensive Development Process

Phase 2 represents the most systematically documented evolution in the project, with detailed build logs chronicling every step:

Motivation & Goals: Clear articulation of the need for higher-order expert interactions and improved scaling
Implementation: Step-by-step integration of PyTorch Geometric hypergraph layers
Testing: Comprehensive validation including unit tests, integration tests, and performance benchmarks
Optimization: Multiple rounds of performance tuning and bug fixes
Completion: Full validation of the HGNN implementation with stable training

Key Technical Achievements

Seamless Architecture Integration: The HGNN coupler serves as a drop-in replacement for the GNN coupler, controlled by simple configuration flags. This modular design enables easy comparison between approaches.

Multi-Environment Validation: Successful testing across macOS (M3), Colab (A100), and local GPU environments demonstrates robust cross-platform compatibility.

Performance Optimization: Despite initial batching challenges with PyG layers, the final implementation achieved stable training with consistent loss reduction and expert specialization.

Hyperedge Strategy Analysis

All-Pairs Implementation: With N experts, creates C(N,2) = N(N-1)/2 hyperedges. For 4 experts, this generates 6 pairwise hyperedges, providing a foundation for expert communication.

All-Triplets Scaling: With N experts, creates C(N,3) = N(N-1)(N-2)/6 hyperedges. For 4 experts, this generates 4 triplet hyperedges, enabling genuine higher-order reasoning patterns.

The choice between strategies creates an interesting trade-off: pairs provide more direct communication channels, while triplets enable richer but sparser interaction patterns.

Memory Efficiency Breakthrough

HGNN architectures potentially offer O(H·k) memory complexity where H is the number of hyperedges and k is the average hyperedge size, compared to the O(E²) complexity of dense GNN adjacency matrices. This improvement becomes more significant as expert counts increase.

Early experiments showed promise for scaling to 16+ experts within VRAM limits, though full validation of large-scale configurations awaited the stability improvements of Phase 3.

Path to Adaptive Intelligence

Phase 2 established the hypergraph foundation but revealed a new challenge: while experts could form rich communication patterns, there was no guarantee they would specialize in non-redundant ways. Multiple experts might learn similar features, wasting the expanded capacity.

This observation led directly to Phase 3's focus on enforcing expert specialization through orthogonality constraints, ensuring that the sophisticated communication mechanisms would connect truly distinct expert capabilities.

Implementation Details

The complete Phase 2 implementation includes:

Configurable hyperedge strategies through simple boolean flags
Learnable vs. fixed hyperedge weights for ablation studies
Comprehensive test suites validating hyperedge generation and communication
Performance monitoring for VRAM usage and training speed comparison
Detailed build logs documenting the complete development process

The thorough documentation in the build-log directory provides valuable insights into the engineering decisions and problem-solving approaches that shaped the final architecture.