StepFun's StepAudio 2.5 Realtime Tops Five Voice AI Benchmarks, Beats GPT Realtime 1.5

Shanghai-based AI lab StepFun released StepAudio 2.5 Realtime this week, an end-to-end real-time voice model supporting Chinese and English. The model topped all five voice AI benchmarks tested in April 2026, outperforming OpenAI's GPT Realtime 1.5 and Google's Gemini Live, according to StepFun's testing.

On the paralinguistic comprehension benchmark—measuring acoustic feature perception like emotion and speaking rate on a 0–100 scale—StepAudio scored 82.18 versus GPT Realtime 1.5's 80.46 and Gemini Live's 58.05. In human evaluation testing, StepAudio achieved 80.41 compared to 68.01 for GPT Realtime 1.5 and 67.16 for Gemini Live. StepFun trained the model on a million-scale persona dataset with roleplay-specific reinforcement learning to maintain character consistency during extended conversations.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments