OpenAI releases new real-time audio models for voice agents

OpenAI has launched new real-time audio models aimed at enhancing conversational and task-capable voice agents. The release includes three specific models: GPT-Realtime-2 for speech-to-speech reasoning, GPT-Realtime-Translate for streaming language translation, and GPT-Realtime-Whisper for real-time transcription. These innovations are designed for applications in customer service, education, media, and content creation, and incorporate safeguards against spam and fraud. The models are accessible through the OpenAI Realtime API, which enables voice agents to listen, reason, and perform tasks during interactions.

OpenAI: OpenAI is an AI research organization that develops advanced foundation models and APIs, enabling applications like ChatGPT for enterprise and developer use across various domains. Recently, it has advanced multimodal capabilities and enterprise integrations while updating its partnership with Microsoft for long-term AI scaling. In this news, OpenAI released new real-time audio models including GPT-Realtime-2 to power conversational voice agents capable of live reasoning, translation, and transcription.

`json
{
“Use Cases”: “The new models aim to enhance applications in customer service, education, and media, focusing on improving conversational capabilities.”,
“Voice Models”: “This release features models such as GPT-Realtime-2 for speech-to-speech reasoning, GPT-Realtime-Translate for language translation, and GPT-Realtime-Whisper for transcription tasks.”,
“API Integration”: “The models are accessible through the OpenAI Realtime API, enabling integration into voice agents for dynamic conversations.”
}
`