Gate News message, April 24 — DeepSeek V4 has published results from formal mathematical reasoning evaluations, achieving a perfect score of 120/120 on Putnam-2025, tying with Axiom for first place.
In the practical regime using LeanExplore and constrained sampling, V4-Flash-Max scored 81.00 on the Putnam-200 Pass@8 benchmark, significantly outperforming Seed-2.0-Prover (35.50), Gemini 3 Pro (26.50), and Seed-1.5-Prover (26.50). The frontier regime results showed V4 ahead of Seed-1.5-Prover (110/120) and Aristotle (100/120).
V4 employs a hybrid formal-informal reasoning approach: informal reasoning generates candidate natural language solutions, self-verification filters the results, and a formal agent completes rigorous proofs in Lean. The frontier results utilized large-scale computational scaling, while practical regime scores better reflect standard deployment capabilities.
Related News
OpenAI launches GPT-5.5: 12M context, AA index tops the chart, and Terminal-Bench rewrites the agent benchmark with 82.7%
Hyperliquid HYPE Hits 60-Day High on HIP4 Momentum
DeepSeek discusses its first round of external funding, valuation at $20 billion: China’s AI valuation hits a new high