
In a bid to stand out in the bustling voice AI sector, OpenAI has introduced its latest model, gpt-realtime. This innovative technology is designed to handle intricate instructions and generate voices that are described as more natural and expressive. As the demand for realistic-sounding AI voices grows—particularly for applications like customer service and real-time translation—OpenAI aims to capture a significant share of this evolving market. The gpt-realtime model will be accessible through the newly launched Realtime API, which OpenAI has made generally available. This update includes the introduction of new voice options, named Cedar and Marin, alongside enhancements to existing voices to ensure compatibility with the latest features. During a recent livestream, OpenAI highlighted that gpt-realtime was developed in collaboration with clients focused on building voice applications, ensuring that the model is finely tuned to real-world scenarios such as customer support and academic tutoring. The model is structured within a speech-to-speech framework, allowing it to comprehend spoken prompts and respond vocally. This capability is ideal for real-time interactions, such as when a customer contacts a service platform to inquire about returning products. In the livestream, T-Mobile showcased an AI voice agent that assists users in discovering new phones, while Zillow demonstrated an agent that helps users select the perfect neighborhood for their needs. OpenAI asserts that gpt-realtime is its most advanced and production-ready voice model. It boasts the ability to switch languages mid-sentence and follow complex instructions, including the unique request to “speak emphatically in a French accent.” However, it will face stiff competition from established players like Eleven Labs, which recently launched Conversation AI 2.0, and Hume, known for its EVI 3 model that creates personalized AI voice replicas. As enterprises continue to explore various applications for voice AI, additional general model providers are also making their mark in the industry. For instance, Mistral introduced its Voxtral model, aimed at real-time translation, while Google is enhancing its audio capabilities with a new feature on Notebook LM that converts research notes into podcasts. OpenAI claims that gpt-realtime displays improved intelligence and a better understanding of audio nuances, including non-verbal cues such as laughter or sighs. In accuracy tests, the model achieved an impressive 82.8% score on the Big Bench Audio evaluation, significantly surpassing its predecessor's 65.6%. While OpenAI did not disclose competitive testing figures, the emphasis on enhancing the model's instruction-following abilities is notable, as it scored 30.5% on the Multi Challenge audio benchmark. To support the deployment of gpt-realtime in enterprise applications, OpenAI has introduced several new features to the Realtime API. These enhancements include support for MCP and the ability to recognize image inputs, a feature that allows users to receive real-time visual information. The Realtime API can also facilitate Session Initiation Protocol (SIP), enabling connections between applications and telephony systems, which opens new possibilities for contact center integrations. As initial reactions to the model emerge, early testers report significant improvements in audio quality and responsiveness. While gpt-realtime shows promise, some concerns remain, such as the absence of options for custom voices and its relative cost compared to traditional text-to-speech solutions. Nonetheless, OpenAI has reduced the pricing for gpt-realtime by 20%, setting it at $32 per million audio input tokens and $64 for audio output tokens, positioning it competitively within the market.
China is experiencing a rapid expansion in the use of the AI tool OpenClaw, as major technology companies and local gove...
CNBC | Mar 12, 2026, 09:30
The excitement surrounding silicon anode batteries is reaching new heights among electric vehicle (EV) enthusiasts and h...
TechCrunch | Mar 12, 2026, 12:25
Nintendo's shares have surged by 18% this week, driven by the unexpected success of a new Pokémon game that has sparked ...
CNBC | Mar 12, 2026, 12:05
In China, the OpenClaw phenomenon has taken an unexpected turn, creating a unique economic ecosystem around the AI agent...
Business Insider | Mar 12, 2026, 08:45Blue Owl Capital is intensifying its focus on artificial intelligence infrastructure, recently committing significant fu...
Business Insider | Mar 12, 2026, 10:15