Phase 2: HGNN-MoE - Evolution to Higher-Order Expert Interactions

While Phase 1's GNN-MoE proved the power of dense expert collaboration, it revealed critical limitations: VRAM constraints prevented scaling beyond 8 experts, and traditional graph edges could only model pairwise relationships. Phase 2 addressed both challenges by evolving from Graph Neural Networks to Hypergraph Neural Networks (HGNNs).

The key insight: What if experts could form coalitions and communicate in groups, not just pairs?

From Edges to Hyperedges

Traditional GNNs connect experts through edges that link exactly two nodes. But real-world collaboration often involves multiple participants simultaneously. HGNNs use hyperedges that can connect any number of experts, enabling:

Think of it as evolving from a series of one-on-one conversations to enabling group discussions where multiple experts can contribute simultaneously to complex reasoning tasks.

Technical Architecture

The HGNN coupler replaces the traditional GNN with PyTorch Geometric's `HypergraphConv` layers, implementing static hyperedge strategies:

"All-Pairs" Strategy: Creates hyperedges connecting every pair of experts, similar to GNN edges but with more flexible representations.

"All-Triplets" Strategy: Forms hyperedges connecting every group of three experts, enabling genuine higher-order interactions that capture emergent behaviors from expert triplets.

Learnable Hyperedge Weights: Each hyperedge can have learned importance weights, allowing the model to discover which expert coalitions are most valuable for different types of reasoning.

Comprehensive Development Process

Phase 2 represents the most systematically documented evolution in the project, with detailed build logs chronicling every step:

Key Technical Achievements

Seamless Architecture Integration: The HGNN coupler serves as a drop-in replacement for the GNN coupler, controlled by simple configuration flags. This modular design enables easy comparison between approaches.

Multi-Environment Validation: Successful testing across macOS (M3), Colab (A100), and local GPU environments demonstrates robust cross-platform compatibility.

Performance Optimization: Despite initial batching challenges with PyG layers, the final implementation achieved stable training with consistent loss reduction and expert specialization.

Hyperedge Strategy Analysis

All-Pairs Implementation: With N experts, creates C(N,2) = N(N-1)/2 hyperedges. For 4 experts, this generates 6 pairwise hyperedges, providing a foundation for expert communication.

All-Triplets Scaling: With N experts, creates C(N,3) = N(N-1)(N-2)/6 hyperedges. For 4 experts, this generates 4 triplet hyperedges, enabling genuine higher-order reasoning patterns.

The choice between strategies creates an interesting trade-off: pairs provide more direct communication channels, while triplets enable richer but sparser interaction patterns.

Memory Efficiency Breakthrough

HGNN architectures potentially offer O(H·k) memory complexity where H is the number of hyperedges and k is the average hyperedge size, compared to the O(E²) complexity of dense GNN adjacency matrices. This improvement becomes more significant as expert counts increase.

Early experiments showed promise for scaling to 16+ experts within VRAM limits, though full validation of large-scale configurations awaited the stability improvements of Phase 3.

Path to Adaptive Intelligence

Phase 2 established the hypergraph foundation but revealed a new challenge: while experts could form rich communication patterns, there was no guarantee they would specialize in non-redundant ways. Multiple experts might learn similar features, wasting the expanded capacity.

This observation led directly to Phase 3's focus on enforcing expert specialization through orthogonality constraints, ensuring that the sophisticated communication mechanisms would connect truly distinct expert capabilities.

Implementation Details

The complete Phase 2 implementation includes:

The thorough documentation in the build-log directory provides valuable insights into the engineering decisions and problem-solving approaches that shaped the final architecture.