Gate News message, April 24 — V4 has publicly disclosed internal dogfooding data for its V4-Pro model. The company collected approximately 200 real-world engineering tasks from over 50 engineers, covering feature development, bug fixes, refactoring, and diagnostics across tech stacks including PyTorch, CUDA, Rust, and C++. After rigorous filtering, 30 tasks were retained for the benchmark evaluation.
V4-Pro-Max achieved a 67% coding pass rate, significantly outperforming Sonnet 4.5 at 47% and approaching Opus 4.5 at 70%. However, it trails Opus 4.5 Thinking (73%) and Opus 4.6 Thinking (80%), while substantially exceeding Haiku 4.5 at 13%.
In an internal survey with 85 respondents, all participants reported using V4-Pro for agentic coding in daily workflows. 52% endorsed V4-Pro as their default primary coding model, 39% leaned toward approval, and less than 9% expressed disapproval. Reported issues included low-level errors, misinterpretation of ambiguous prompts, and occasional over-thinking behavior.
Related News
OpenAI launches GPT-5.5: 12M context, AA index tops the chart, and Terminal-Bench rewrites the agent benchmark with 82.7%
Google Jules releases a new version candidate list, repositioning it as an end-to-end product development platform
Google Expands Wiz Cloud Security Across AWS, Azure, and Google Cloud