nanoZkinference
A zero-knowledge proof system for verifiable LLM inference using Halo2 IPA: 6.9KB proof per layer (O(1) size), 22ms verification, 0.00% perplexity degradation, 3.2-minute full GPT-2 proof with 12 parallel workers, and Fisher-guided layer selection adding +10pp coverage at 50% compute budget. Submitted to ICICS 2026 and VerifAI@ICLR 2026.
// DESCRIPTION
The Problem: LLM-as-a-Service Has No Cryptographic Correctness Guarantee
When you call the OpenAI API, you receive a text response. You have no way to verify that the response was actually generated by the model you paid for (GPT-4 vs. a cheaper model) or that the model was executed correctly without tampering. This trust gap is not merely academic: high-stakes applications in healthcare, legal reasoning, financial compliance, and code generation need cryptographic proof that the model executed exactly the advertised computation — not an approximation, not a cheaper substitute, not a cached hallucination.
Zero-knowledge proofs (ZKPs) offer a theoretical solution: a prover can generate a compact cryptographic proof that a specific computation was performed correctly, which a verifier can check in milliseconds without re-running the computation. But adapting ZKP systems to neural network inference is extremely challenging: neural networks use floating-point arithmetic, non-linear activation functions (softmax, GELU, SiLU, RMSNorm), and operate at scales of billions of parameters — none of which fit naturally into the finite-field arithmetic that ZKP circuits require. NanoZKInference solves this for transformer language models at practical scale.
问题:LLM即服务缺乏密码学正确性保证
当你调用OpenAI API时,无法验证响应是否真的由你付费的模型生成,也无法证明模型未被篡改。在医疗、法律、金融合规、代码生成等高风险应用中,需要密码学证明:模型精确执行了所宣传的计算,而非近似值、廉价替代品或缓存幻觉。零知识证明(ZKP)提供了理论解决方案,但将ZKP系统适配到神经网络推理极为困难:浮点运算、非线性激活函数(softmax/GELU/SiLU/RMSNorm)和数十亿参数规模,都与ZKP电路所需的有限域运算格格不入。NanoZKInference为Transformer语言模型解决了这一问题。
Situation: The ZKP-for-ML Efficiency Chasm
Prior work on ZKP-based neural network verification faced a fundamental scalability wall: proof generation time scaled as O(n log n) or worse with model size, making even small models (125M parameters) take hours to prove. Non-linear operations like softmax and GELU were approximated with polynomials, introducing accuracy degradation. And proof sizes scaled linearly with the number of layers, making bandwidth a bottleneck for client-server deployment. NanoZKInference attacks all three bottlenecks simultaneously.
场景:ZKP用于机器学习的效率鸿沟
既有ZKP神经网络验证工作面临根本性可扩展性障碍:证明生成时间随模型大小以O(n log n)或更差的复杂度扩展;softmax和GELU等非线性运算需多项式近似,导致精度退化;证明大小随层数线性增长。NanoZKInference同时攻击这三个瓶颈。
Task: Prove GPT-2 Inference Correctly in Under 5 Minutes
The concrete goal was to generate a cryptographically sound, succinct proof of correct GPT-2 (124M parameters, 12 layers) inference that: (a) introduces zero perplexity degradation on standard benchmarks, (b) fits in a small constant size per layer, (c) verifies in under 100ms, and (d) completes within a time budget acceptable for asynchronous verification in production deployments.
任务:在5分钟内证明GPT-2推理的正确性
具体目标:为GPT-2(1.24亿参数,12层)生成密码学上可靠、简洁的正确推理证明,满足:(a)零困惑度退化,(b)每层固定常数大小,(c)100ms内验证,(d)在生产部署异步验证的时间预算内完成。
Action: Halo2 IPA + 16-bit Lookup Tables + Fisher-Guided Layer Selection
The system architecture combines four technical contributions.
Contribution 1 — Halo2 IPA Backend. NanoZKInference uses the Halo2 proof system with the Inner Product Argument (IPA) polynomial commitment scheme. Halo2 IPA was chosen for its O(1) proof size (independent of circuit depth), its native support for custom gates that map cleanly to transformer operations, and its Rust implementation that enables tight integration with the inference engine.
Contribution 2 — 16-bit Fixed-Point Lookup Tables for Non-Linearities. Rather than approximating softmax, GELU, SiLU, and RMSNorm with polynomials (which introduces accuracy degradation), NanoZKInference encodes these functions as exact lookup tables over a 16-bit fixed-point domain. Each activation value is quantized to 16-bit precision, looked up in a precomputed table, and the lookup is verified in-circuit using Halo2's Plonkish lookup argument. This achieves 0.00% perplexity degradation on WikiText-2 while remaining fully provable in the ZK circuit.
Contribution 3 — SHA-256 Layer Commitment Chains. Model weights are committed using SHA-256 hashes organized in a layerwise chain: each layer's hash is computed over the weights and the previous layer's hash, creating a tamper-evident commitment chain from the first layer to the last. The prover includes this chain in the proof, allowing the verifier to check both computational correctness and weight integrity in a single pass.
Contribution 4 — Fisher Information Layer Selection. Not all transformer layers contribute equally to output quality. NanoZKInference uses Fisher Information to rank layers by their importance to model performance, enabling a partial-proof mode where only the most important layers are proven. At 50% of the full proof budget, Fisher-guided selection achieves +10 percentage points higher benchmark coverage than uniform layer sampling.
NanoZKInference Pipeline
Input Text
|
v
GPT-2 Forward Pass (16-bit fixed-point)
|
+-- Per-layer: weight hash -> SHA-256 chain commitment
|
+-- Activation LUT verification (softmax/GELU/SiLU/RMSNorm)
| via Plonkish lookup argument
|
+-- Halo2 IPA circuit -> Proof (6.9 KB/layer, O(1))
|
v
Verifier (22ms)
|
+-- ACCEPT / REJECT
行动:Halo2 IPA + 16位查找表 + Fisher引导层选择
四大技术贡献:①Halo2 IPA后端——O(1)证明大小,原生自定义门支持,Rust实现;②16位定点域精确查找表编码softmax/GELU/SiLU/RMSNorm——零困惑度退化,完全可证明;③SHA-256逐层承诺链——防篡改权重完整性验证;④Fisher信息层选择——50%预算下比均匀采样高+10pp覆盖率。
Results: 6.9KB/Layer, 22ms Verification, 0.00% Perplexity Degradation
- Proof size: 6.9KB per layer — O(1) with respect to model depth (does not grow as you add layers)
- Verification time: 22ms per full proof — well within interactive latency budgets for real-time applications
- Perplexity degradation: 0.00% on WikiText-2 benchmark — the 16-bit fixed-point quantization and lookup table approach preserves full model accuracy
- Full GPT-2 proof time: 3.2 minutes with 12 parallel worker processes — acceptable for asynchronous production verification
- Fisher layer selection: +10pp coverage at 50% proof budget — enables practical partial-proof deployment for latency-sensitive applications
NanoZKInference has been submitted to both ICICS 2026 and the VerifAI@ICLR 2026 Workshop, targeting the intersection of cryptographic verifiability and practical neural network deployment.
结果:6.9KB/层,22ms验证,0.00%困惑度退化
核心指标:证明大小6.9KB/层(O(1),不随深度增长),验证时间22ms,WikiText-2困惑度退化0.00%,12并行工作进程下完整GPT-2证明耗时3.2分钟,Fisher层选择在50%预算下比均匀采样高+10pp覆盖率。该工作已提交至ICICS 2026与VerifAI@ICLR 2026 Workshop。
// HIGHLIGHTS
- Submitted to ICICS 2026 and VerifAI@ICLR 2026 Workshop
- 6.9KB proof per layer — O(1) size, independent of model depth (Halo2 IPA backend)
- 22ms verification time — interactive latency for real-time cryptographic checks
- 0.00% perplexity degradation — 16-bit fixed-point lookup tables preserve full GPT-2 accuracy
- 3.2-minute full GPT-2 proof with 12 parallel workers — practical for async production deployment
- Fisher-guided layer selection: +10pp coverage at 50% proof budget vs. uniform sampling
- SHA-256 layerwise commitment chains provide weight integrity alongside computational correctness proofs
- Rust (Halo2 + PyO3) + Python harness — complete cryptographic stack purpose-built for transformer inference