ACTIVE Research

Universe-Routing

A 465M-parameter fine-tuned router that classifies queries into 7 distinct epistemic universes (frequentist, Bayesian, classical, quantum, etc.) before LLM reasoning — achieving 97.25% test accuracy, 83.93% OOD generalization, 12.9ms latency, and near-zero catastrophic forgetting. Accepted at LLA@ICLR 2026 (arXiv:2603.14799).

VIEW_SOURCE() DOCS()

status ACTIVE

type Research

started 2024-08-01

stack ['Python' 'Qwen-1.5-0.5B' 'HuggingFace Transformers' 'Fisher Information' 'LoRA Fine-tuning']

// DESCRIPTION

The Problem: LLMs Cannot Tell Which Universe a Question Belongs To

Imagine asking an LLM: 'What is the probability that this coin comes up heads?' A frequentist statistician would answer: 'It is either 0 or 1 — we need to run the experiment.' A Bayesian would answer: 'Given your prior belief and any available evidence, the probability is 0.5 unless updated.' A quantum physicist, if the coin were a quantum system, might answer in terms of superposition states. All three answers are internally consistent within their respective frameworks, yet they contradict each other. The problem is not that the question is ambiguous — it is that the question implicitly lives in a specific epistemic universe, and answering it correctly requires first identifying which universe applies.

Current LLMs regularly confuse these frameworks, producing answers that blend frequentist and Bayesian vocabulary incoherently, or that apply classical reasoning to inherently quantum problems. This is not a knowledge deficit — the models contain all relevant knowledge — it is a routing failure: the model does not know which epistemic lens to apply before generating its response. Universe Routing proposes a hard architectural solution: a small, fast classifier that routes every query to the correct epistemic context before any reasoning begins.

问题：LLM无法识别问题所属的认识论宇宙
频率主义与贝叶斯主义、经典力学与量子力学——这些框架在各自内部完全自洽，却相互矛盾。当前LLM频繁混淆这些框架，将频率主义与贝叶斯词汇混用，或将经典推理错误应用于量子问题。这不是知识匮乏，而是路由失败：模型在生成答案前不知道该应用哪个认识论视角。Universe Routing提出了一个硬性架构解决方案：用一个小型快速分类器，在推理开始前将每个查询路由到正确的认识论上下文。

Situation: Seven Universes, One Confused Monolith

The research identified 7 distinct epistemic universes that represent fundamentally incompatible reasoning frameworks:

U1 — Frequentist Statistics: probability as long-run frequency
U2 — Bayesian Inference: probability as subjective belief updated by evidence
U3 — Classical Physics: deterministic Newtonian mechanics
U4 — Quantum Mechanics: probabilistic wave-function collapse
U5 — Formal Logic: binary truth values, propositional/predicate calculus
U6 — Fuzzy/Paraconsistent Logic: degrees of truth, contradiction tolerance
U7 — Game-Theoretic Reasoning: strategic equilibria, mechanism design

A benchmark of 685 carefully augmented samples was constructed, covering all 7 universes with adversarial paraphrases, domain transfer examples, and edge cases designed to trick keyword-based classifiers. The dataset was split into a training set of 576 samples and a test set of 109 held-out samples, with a separate out-of-distribution (OOD) evaluation set drawn from academic papers and real user queries outside the training distribution.

场景：七个宇宙，一个混乱的单体模型
研究识别出7种代表根本不相容推理框架的认识论宇宙（频率主义、贝叶斯、经典物理、量子力学、形式逻辑、模糊逻辑、博弈论），并构建了685个精心增强的样本，涵盖对抗性改写、领域迁移样本和专门欺骗关键词分类器的边界案例。训练集576个，测试集109个，另有独立的OOD评估集。

Task: Train a 465M Router That Beats 70B Models at Classification

The central challenge was: how do you train a small model (Qwen-1.5-0.5B, 465M parameters) to be a reliable hard router for 7 epistemically distinct universes, such that it generalizes well out-of-distribution, resists adversarial rephrasing, and does not degrade its general language capabilities (catastrophic forgetting)? The router must also be fast enough to add negligible overhead in a production inference pipeline.

任务：训练一个465M路由器超越70B模型的分类能力
核心挑战：如何用仅465M参数的Qwen-1.5-0.5B，训练出一个可靠的硬路由器，使其具备良好的OOD泛化能力、抵抗对抗性改写，且不损害通用语言能力（灾难性遗忘）？同时，路由器必须足够快速，在生产推理流水线中几乎无开销。

Action: Fisher-Regularized LoRA Fine-Tuning on Curated Contrastive Data

The training approach combined three innovations.

Innovation 1 — Contrastive Data Augmentation. The 685-sample dataset was constructed with deliberate contrastive pairs: for each universe, adversarial examples were created by swapping terminology across universes (e.g., using Bayesian vocabulary in a frequentist question) to teach the model to look beyond surface keywords. This is critical because a naive keyword classifier achieves only 34.25% on adversarial test inputs.

Innovation 2 — Fisher Information Regularization for Anti-Forgetting. To prevent fine-tuning from degrading Qwen's general language capabilities, the training loss includes an EWC-style (Elastic Weight Consolidation) regularization term weighted by the Fisher Information Matrix. Parameters that carry high Fisher information for general language tasks are penalized more strongly if they deviate from their pre-trained values, anchoring the model's general abilities while freeing low-Fisher parameters to specialize for routing.

Innovation 3 — Temperature-Calibrated Confidence Thresholding. The router outputs a softmax distribution over 7 classes. If the maximum confidence falls below a calibrated threshold (determined via Platt scaling on a held-out calibration set), the query is flagged as ambiguous and routed to a fallback 'multi-universe' prompt that asks the downstream LLM to reason under epistemic uncertainty — rather than forcing a potentially wrong hard assignment.

Query Input
    |
    v
Universe Router (Qwen-1.5-0.5B fine-tuned, 12.9ms)
    |
    +-- Confidence >= theta?  --YES-->  Hard Route to Universe Ui
    |                                       |
    |                                       v
    |                             Universe-Specific System Prompt
    |                                       |
    |                                       v
    |                                Downstream LLM (any size)
    |
    +-- Confidence < theta?   --NO--->  Multi-Universe Fallback Prompt
                                              |
                                              v
                                   Downstream LLM with uncertainty framing

行动：Fisher正则化LoRA微调 + 对比数据增强 + 置信度阈值路由
三大创新：①构建对比数据对（跨宇宙术语对换），使模型超越表面关键词；②Fisher信息矩阵正则化防止灾难性遗忘（EWC风格），高Fisher参数受更强约束；③Platt缩放校准置信度阈值：低置信度查询路由至多宇宙后备提示，避免强制错误分配。

Results: 97.25% Accuracy, 83.93% OOD, 1.53% Adversarial Success Rate

The fine-tuned router was evaluated across four dimensions:

In-distribution test accuracy: 97.25% (106/109 samples correct)
Out-of-distribution accuracy: 83.93% on academic papers and real user queries outside the training distribution
Adversarial robustness: 1.53% adversarial success rate (attackers succeeded in only 1.53% of attempts) vs. 65.75% success rate against a keyword-based classifier
Latency: 12.9ms per query — compared to 840–16,275ms for cloud-hosted LLM routing alternatives (up to 1,261x faster)
Catastrophic forgetting: 0% on standard language benchmarks (MMLU, HellaSwag) — Fisher regularization fully preserved general capabilities

The work was accepted at the LLA@ICLR 2026 Workshop (arXiv:2603.14799), demonstrating that a carefully trained small model can outperform naive LLM-based routing on epistemic classification while adding only 12.9ms to the inference pipeline.

结果：97.25%准确率，83.93% OOD，12.9ms延迟，零灾难性遗忘
核心指标：域内测试准确率97.25%（106/109），OOD准确率83.93%，对抗成功率仅1.53%（关键词分类器为65.75%），延迟12.9ms（云端LLM路由为840-16275ms，最高提速1261倍），灾难性遗忘0%。该工作已被LLA@ICLR 2026 Workshop接收（arXiv:2603.14799）。

Publication

Accepted at LLA Workshop @ ICLR 2026

Read the paper on arXiv: 2603.14799 →

发表论文

已被接收：LLA Workshop @ ICLR 2026

在 arXiv 阅读论文: 2603.14799 →

Publication

Accepted at LLA Workshop @ ICLR 2026

Read the paper on arXiv: 2603.14799 →

发表论文

已被接收: LLA Workshop @ ICLR 2026

在 arXiv 阅读论文: 2603.14799 →

// HIGHLIGHTS

Accepted at LLA@ICLR 2026 Workshop (arXiv:2603.14799)
97.25% test accuracy (106/109) classifying queries into 7 epistemic universes
83.93% OOD generalization on academic papers and real user queries outside training distribution
12.9ms routing latency — up to 1,261x faster than cloud-hosted LLM routing alternatives
1.53% adversarial success rate vs. 65.75% for keyword-based classifiers — robust to rephrasing attacks
0% catastrophic forgetting — Fisher Information regularization fully preserves general language capabilities
465M Qwen-1.5-0.5B fine-tuned on 685 contrastive samples with EWC-style Fisher regularization
Temperature-calibrated confidence thresholding routes ambiguous queries to multi-universe fallback

< BACK_TO_PROJECTS()