HOME / BLOG / Building an LLM-Assisted Theorem Prover

> Building an LLM-Assisted Theorem Prover

|
~5min READ
How I built ProofCraft - a tool that combines GPT-4 and Claude with Lean 4 for automated theorem proving.

Formal mathematics verification has traditionally been the domain of experts. With the advent of powerful LLMs, we can now bridge the gap between natural language reasoning and formal proofs.

The Challenge

Theorem provers like Lean 4 require precise syntax and logical steps. LLMs excel at pattern recognition but can generate syntactically incorrect code. The key insight is to use LLMs as a tactic generator rather than a complete proof writer.

Architecture

ProofCraft uses a multi-agent architecture:

  • Analyzer Agent: Understands the goal state and available hypotheses
  • Strategy Agent: Decides which proof approach to use
  • Tactic Generator: Produces Lean 4 tactics
  • Verifier: Checks if the tactic is valid

Results

On the miniF2F benchmark, ProofCraft achieves 28.3% success rate, competitive with state-of-the-art systems while being significantly more cost-efficient due to our caching system.

Lessons Learned

  1. Caching LLM responses is crucial for cost management
  2. Multiple search strategies (DFS, BFS, Best-First) should be combined
  3. Error messages from Lean provide valuable feedback for retry logic

The code is available on GitHub, and I welcome contributions!