Formal mathematics verification has traditionally been the domain of experts. With the advent of powerful LLMs, we can now bridge the gap between natural language reasoning and formal proofs.
The Challenge
Theorem provers like Lean 4 require precise syntax and logical steps. LLMs excel at pattern recognition but can generate syntactically incorrect code. The key insight is to use LLMs as a tactic generator rather than a complete proof writer.
Architecture
ProofCraft uses a multi-agent architecture:
- Analyzer Agent: Understands the goal state and available hypotheses
- Strategy Agent: Decides which proof approach to use
- Tactic Generator: Produces Lean 4 tactics
- Verifier: Checks if the tactic is valid
Results
On the miniF2F benchmark, ProofCraft achieves 28.3% success rate, competitive with state-of-the-art systems while being significantly more cost-efficient due to our caching system.
Lessons Learned
- Caching LLM responses is crucial for cost management
- Multiple search strategies (DFS, BFS, Best-First) should be combined
- Error messages from Lean provide valuable feedback for retry logic
The code is available on GitHub, and I welcome contributions!