OpenAI Releases Three Voice Models in Realtime API; GPT-Realtime-2 Features 128K Context Window

According to Beating, OpenAI released three voice models in its Realtime API: GPT-Realtime-2 for voice conversation with reasoning, GPT-Realtime-Translate for real-time translation, and GPT-Realtime-Whisper for streaming transcription. GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-level reasoning capability, expanding context window from 32K to 128K tokens, supporting up to 1-2 hours of dense conversation.

GPT-Realtime-2 improved 15.2% on Big Bench Audio benchmark and 13.8% on Audio MultiChallenge compared to GPT-Realtime-1.5. GPT-Realtime-Translate supports 70+ input languages translating to 13 output languages. Pricing: GPT-Realtime-2 at $32/million input tokens and $64/million output tokens; Translate at $0.034/minute; Whisper at $0.017/minute.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

U.S. and China Set to Launch Official AI Safety Dialogue, Led by Treasury Officials

According to reports, the United States and China are preparing to launch an official AI safety dialogue aimed at establishing crisis management mechanisms for their technology competition. The U.S. delegation will be led by Treasury Secretary Scott Bessent, while China will be represented by Vice

GateNews23m ago

RLWRLD Releases RLDX-1 AI Model for Industrial Robotic Hands

RLWRLD, a robotics AI startup backed by LG Electronics, unveiled RLDX-1, a foundation model designed for five-finger robotic hands in industrial applications, according to RLWRLD. The company released the model's weights, code, and technical documents on GitHub and Hugging Face. Model

CryptoFrontier43m ago

DeepMind AlphaEvolve cross-domain performance: 4×4 matrix multiplication refreshes the 1969 Strassen record, Gemini training is 1% faster

Google DeepMind on May 7 (U.S. time) released a report on AlphaEvolve’s cross-domain breakthroughs. DeepMind’s official blog summarizes AlphaEvolve’s concrete progress since its launch: it found a 4×4 complex matrix multiplication method better than the Strassen 1969 algorithm (48 pure scalar multiplications), collaborated with mathematicians such as Terence Tao to solve multiple Erdős mathematical problems, saved 0.7% of global computing resources for Google data centers, increased the speed of the key kernels trained with Gemini by 23%, and reduced overall Gemini training time by 1%. Architecture: Gemini Flash wide exploration + Gemini

ChainNewsAbmedia55m ago

OpenAI Codex launches a Chrome extension: can test Web Apps in the browser, pull Context across pages, and run in parallel

On May 7, OpenAI (U.S. time) released Codex’s Chrome extension, enabling Codex coding agents to run directly inside Chrome browsers on macOS and Windows. OpenAI’s official Codex documentation explains that the extension allows Codex to test web apps without taking over the user’s browsing, gather context across multiple tabs, use Chrome DevTools, and carry out other tasks in parallel. OpenAI also announced that Codex’s weekly active users exceed 4 million, up 8x since the beginning of the year. Things that can be done in the browser: test web apps, get context across pages, use DevTools Chrome extension

ChainNewsAbmedia59m ago

OpenAI launches GPT-Realtime-2: brings GPT-5 reasoning into voice agents, with context up to 128K

OpenAI on May 7 (U.S. time) announced three new Realtime voice models at its developer conference: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. All are available to developers via the Realtime API. In its official announcement, OpenAI said GPT-Realtime-2 is the first OpenAI model to have GPT-5

ChainNewsAbmedia1h ago

On-site visit to China’s AI laboratories: Researchers reveal that the “chip and data gap” is the key to the China-U.S. divide

Nathan Lambert, who conducted in-depth visits to multiple AI labs across China, pointed out that China’s core advantages lie in its culture, talent, and pragmatic mindset. Research prioritizes improving model quality, with students serving as key contributors, and organizations experiencing less internal infighting; however, there are gaps in chips, data, and creativity. External computing power is constrained by U.S. regulations, and low data quality has driven the push to build in-house training environments. Companies open-source but retain core technologies for their own fine-tuning; if the U.S. tightens the open ecosystem, it could affect China’s global leading position.

ChainNewsAbmedia1h ago
Comment
0/400
No comments