Building an LLM-Assisted Theorem Prover

Formal mathematics verification has traditionally been the domain of experts. With the advent of powerful LLMs, we can now bridge the gap between natural language reasoning and formal proofs.

The Challenge

Theorem provers like Lean 4 require precise syntax and logical steps. LLMs excel at pattern recognition but can generate syntactically incorrect code. The key insight is to use LLMs as a tactic generator rather than a complete proof writer.

Architecture

ProofCraft uses a multi-agent architecture:

Analyzer Agent: Understands the goal state and available hypotheses
Strategy Agent: Decides which proof approach to use
Tactic Generator: Produces Lean 4 tactics
Verifier: Checks if the tactic is valid

Results

On the miniF2F benchmark, ProofCraft achieves 28.3% success rate, competitive with state-of-the-art systems while being significantly more cost-efficient due to our caching system.

Lessons Learned

Caching LLM responses is crucial for cost management
Multiple search strategies (DFS, BFS, Best-First) should be combined
Error messages from Lean provide valuable feedback for retry logic

The code is available on GitHub, and I welcome contributions!

> Building an LLM-Assisted Theorem Prover

The Challenge

Architecture

Results

Lessons Learned