MoE Research Hub - User Manual

1. Introduction

Welcome to the User Manual for the MoE Research Hub. This guide provides detailed instructions on how to use the interactive Command-Line Interface (app.py) to manage the entire lifecycle of your research experiments.

This manual is a companion to the main README.md. For a high-level overview of the project's research goals and architectural evolution, please refer to the README.md. This document focuses on the practical "how-to" of using the software.

Getting Started

All functionality is accessed through the main application script. To begin, run the following command from the root of the project directory:

python3 app.py

2. The Interactive Application (app.py)

The MoE Research Hub is a menu-driven application designed to be intuitive and powerful. Below is a detailed walkthrough of each menu and its capabilities.

2.1 Main Menu

Upon launching the app, you are greeted with the Main Menu. This is the central navigation point.

============================================================
🧠 MoE Research Hub: Main Menu
============================================================

--- Model Status ---
No model loaded.
--------------------

1. Train New Model
2. Load Model from Checkpoint
3. Exit
>

2.2 Training a New Model

Selecting 1. Train New Model starts the Configuration Wizard. This is a powerful tool for crafting your experiments.

The Configuration Wizard

The wizard allows you to define every aspect of your training run. It is designed with a "simple by default, powerful when needed" philosophy.

Current Configuration:
1. Architecture: ghost
2. Run Name: moe_run
3. Batch Size: 8
4. Num Experts: 4
5. Num Ghost Experts: 2
6. Dataset: huggingface -> wikitext
7. Advanced Configuration...

[S] Start Training with these settings
[E] Exit to Main Menu

Selecting a Dataset (Option 6)

This sub-menu allows you to specify the data source for your experiment.

Select Dataset Source:
1. Hugging Face Hub
2. Local File (.txt, .json, .jsonl)
>

2.3 Loading an Existing Model

Selecting 2. Load Model from Checkpoint from the Main Menu allows you to load a saved model.

2.4 The Model Menu

After successfully loading a model, you are taken to the Model Menu, which provides contextual actions for the loaded model.

--- Model Status ---
Loaded Model: my_lambda_calculus_run
Architecture: ghost
Parameters: 1.98M
Checkpoint: checkpoints/my_lambda_calculus_run/checkpoint.pt
--------------------

1. Run Inference
2. Continue Training
3. View Full Configuration
4. Generate Analysis Plots
5. Return to Main Menu
>

3. A Deep Dive into the Architectures

The architecture_mode parameter in the configuration wizard is the most important setting for defining your experiment. It controls which combination of modules and loss functions are active. Here is a breakdown of each mode:

Architecture Modes

gnn

This is the simplest architecture. It uses a standard Transformer block for each expert but does not enable any communication between them. It serves as a baseline to measure the benefit of more complex coordination strategies.

hgnn

This mode activates the HGNNExpertCoupler module. After each expert processes the input, their outputs are fed into a Hypergraph Neural Network. The HGNN allows the experts to exchange and refine their representations based on learned group relationships before their final outputs are combined.

orthogonal

This mode builds directly on hgnn by adding an orthogonality loss to the training objective. This loss function actively encourages the weight matrices of the different experts to be dissimilar, or "orthogonal." This prevents "expert collapse" and ensures a diverse and specialized set of experts.

ghost

This advanced architecture combines all previous features with the Ghost Expert mechanism. It uses HGNN coupling and orthogonality loss, but also includes a secondary pool of "ghost" experts that are dynamically activated when primary experts reach saturation.

🚀 geometric - REVOLUTIONARY BREAKTHROUGH

This implements the world's first Geometric Constrained Learning system, representing a fundamental paradigm shift in machine learning. Instead of adjusting model weights to fit data, geometric training maintains fixed orthogonal expert geometry and learns optimal theta rotation parameters to adjust how data is presented to each expert.

Key Results: 46% improvement in total loss and 96% improvement in expert specialization on lambda calculus reasoning tasks.

4. Hyperparameter Glossary

The following is a reference for all available parameters in the "Advanced Configuration" menu of the training wizard.

Core Parameters

Model Architecture

HGNN Parameters (hgnn)

Ghost Expert Parameters (ghost)

Training Parameters

Dataset Parameters

5. A Guide to Datasets

The framework supports loading data from both the Hugging Face Hub and local files.

Hugging Face Datasets

Local Datasets

The framework can load local text data from .txt, .json, or .jsonl files. The data loader automatically splits the data into a 90% training set and a 10% validation set.

.txt Format

This is the first document. It contains valuable text.
This is the second document, which will be a separate sample.
And a third.

.json or .jsonl Format

{"text": "This is the first document."}
{"text": "This is the second document."}

Example (GRPO/Instruction Format):

{
"question": "((\u03bbx.(\u03bby.(x y))) a) b",
"reasoning": "Apply outer function...",
"answer": "a b"
}

In this case, the loader will combine the values into a single text sample: Question: ((\u03bbx...))\nReasoning: Apply outer...\nAnswer: a b.

Ready to Start Your Research?

Launch the MoE Research Hub and begin experimenting

python3 app.py