Phase 2: HGNN-MoE - Evolution to Higher-Order Expert Interactions
While Phase 1's GNN-MoE proved the power of dense expert collaboration, it revealed critical limitations: VRAM constraints prevented scaling beyond 8 experts, and traditional graph edges could only model pairwise relationships. Phase 2 addressed both challenges by evolving from Graph Neural Networks to Hypergraph Neural Networks (HGNNs).
The key insight: What if experts could form coalitions and communicate in groups, not just pairs?
From Edges to Hyperedges
Traditional GNNs connect experts through edges that link exactly two nodes. But real-world collaboration often involves multiple participants simultaneously. HGNNs use hyperedges that can connect any number of experts, enabling:
- Multi-expert coalitions: Groups of 3, 4, or more experts can collaborate directly
- Higher-order relationships: Capture complex dependencies that can't be expressed through pairwise connections
- Improved memory efficiency: Potentially better scaling characteristics than dense adjacency matrices
Think of it as evolving from a series of one-on-one conversations to enabling group discussions where multiple experts can contribute simultaneously to complex reasoning tasks.
Technical Architecture
The HGNN coupler replaces the traditional GNN with PyTorch Geometric's `HypergraphConv` layers, implementing static hyperedge strategies:
"All-Pairs" Strategy: Creates hyperedges connecting every pair of experts, similar to GNN edges but with more flexible representations.
"All-Triplets" Strategy: Forms hyperedges connecting every group of three experts, enabling genuine higher-order interactions that capture emergent behaviors from expert triplets.
Learnable Hyperedge Weights: Each hyperedge can have learned importance weights, allowing the model to discover which expert coalitions are most valuable for different types of reasoning.
Comprehensive Development Process
Phase 2 represents the most systematically documented evolution in the project, with detailed build logs chronicling every step:
- Motivation & Goals: Clear articulation of the need for higher-order expert interactions and improved scaling
- Implementation: Step-by-step integration of PyTorch Geometric hypergraph layers
- Testing: Comprehensive validation including unit tests, integration tests, and performance benchmarks
- Optimization: Multiple rounds of performance tuning and bug fixes
- Completion: Full validation of the HGNN implementation with stable training
Key Technical Achievements
Seamless Architecture Integration: The HGNN coupler serves as a drop-in replacement for the GNN coupler, controlled by simple configuration flags. This modular design enables easy comparison between approaches.
Multi-Environment Validation: Successful testing across macOS (M3), Colab (A100), and local GPU environments demonstrates robust cross-platform compatibility.
Performance Optimization: Despite initial batching challenges with PyG layers, the final implementation achieved stable training with consistent loss reduction and expert specialization.
Hyperedge Strategy Analysis
All-Pairs Implementation: With N experts, creates C(N,2) = N(N-1)/2 hyperedges. For 4 experts, this generates 6 pairwise hyperedges, providing a foundation for expert communication.
All-Triplets Scaling: With N experts, creates C(N,3) = N(N-1)(N-2)/6 hyperedges. For 4 experts, this generates 4 triplet hyperedges, enabling genuine higher-order reasoning patterns.
The choice between strategies creates an interesting trade-off: pairs provide more direct communication channels, while triplets enable richer but sparser interaction patterns.
Memory Efficiency Breakthrough
HGNN architectures potentially offer O(H·k) memory complexity where H is the number of hyperedges and k is the average hyperedge size, compared to the O(E²) complexity of dense GNN adjacency matrices. This improvement becomes more significant as expert counts increase.
Early experiments showed promise for scaling to 16+ experts within VRAM limits, though full validation of large-scale configurations awaited the stability improvements of Phase 3.
Path to Adaptive Intelligence
Phase 2 established the hypergraph foundation but revealed a new challenge: while experts could form rich communication patterns, there was no guarantee they would specialize in non-redundant ways. Multiple experts might learn similar features, wasting the expanded capacity.
This observation led directly to Phase 3's focus on enforcing expert specialization through orthogonality constraints, ensuring that the sophisticated communication mechanisms would connect truly distinct expert capabilities.
Implementation Details
The complete Phase 2 implementation includes:
- Configurable hyperedge strategies through simple boolean flags
- Learnable vs. fixed hyperedge weights for ablation studies
- Comprehensive test suites validating hyperedge generation and communication
- Performance monitoring for VRAM usage and training speed comparison
- Detailed build logs documenting the complete development process
The thorough documentation in the build-log directory provides valuable insights into the engineering decisions and problem-solving approaches that shaped the final architecture.