XAI's Grok Voice ranks first in agentic performance benchmark

xAI’s Grok Voice Think Fast 1.0 has emerged as the leading Speech to Speech (S2S) model, achieving a performance rate of 52.1% in realistic customer service scenarios measured by the 𝜏-Voice benchmark, which evaluates multi-turn instruction following and tool use across the airline, retail, and telecom sectors. Launched in late April 2026, Grok Voice handles significant voice challenges such as accents and background noise while maintaining conversation length, averaging 5.6 minutes per interaction. Following Grok Voice are OpenAI’s GPT-Realtime-2 and GPT-Realtime-1.5, demonstrating the competitive landscape as these S2S models strive to bridge the performance gap with text-based agents.

xAI: xAI is an artificial intelligence company focused on building the Grok family of models to accelerate human scientific discovery and understand the universe. It develops advanced AI capabilities including voice agents available through its API. xAI’s Grok Voice Think Fast 1.0 topped the recent Artificial Analysis agentic performance benchmark for speech-to-speech models.
Elon Musk: Elon Musk is the founder and leader of xAI, overseeing development of its Grok AI models alongside his roles at Tesla and SpaceX. He guides xAI’s mission toward advancing scientific discovery. He received congratulations from Artificial Analysis for xAI’s Grok Voice achieving the top ranking in the 𝜏-Voice benchmark.
Grok Voice: Grok Voice Think Fast 1.0 is xAI’s state-of-the-art speech-to-speech model optimized for complex multi-step workflows, real-time responses, and handling real-world audio challenges like noise, accents, and interruptions. Launched in late April 2026 via API, it supports voice agents for customer service and tool integration. In the Artificial Analysis 𝜏-Voice benchmark, it leads in agentic performance across realistic customer service scenarios.
𝜏-Voice: 𝜏-Voice is a benchmark extending 𝜏²-bench to evaluate full-duplex speech-to-speech models on realistic customer service tasks with audio complexities like accents, noise, and packet loss. Developed by researchers including Ray, Dhandhania, Barres, and Narasimhan in 2026, it covers airline, retail, and telecom scenarios. Artificial Analysis applied it to compare leading voice agents, highlighting performance gaps relative to text-based systems.
GPT-Realtime-2: GPT-Realtime-2 is OpenAI’s advanced voice model in the Realtime API, designed for production-ready voice agents with enhanced reasoning, interruption handling, and multi-turn conversations. Released in early May 2026, it supports live voice interactions and tool use. In the Artificial Analysis 𝜏-Voice benchmark, it ranks behind Grok Voice in agentic customer service performance.
GPT-Realtime-1.5: GPT-Realtime-1.5 is an earlier OpenAI speech-to-speech model in the Realtime API series, focused on real-time voice processing for agentic tasks. It precedes GPT-Realtime-2 in OpenAI’s voice intelligence lineup. The 𝜏-Voice benchmark by Artificial Analysis shows it performing competitively but trailing Grok Voice in resolving customer service scenarios.
Artificial Analysis: Artificial Analysis is an independent platform providing benchmarks and leaderboards for evaluating AI models across intelligence, performance, and specialized tasks. It maintains the Intelligence Index aggregating multiple challenging evaluations. It recently introduced agentic performance benchmarking for speech-to-speech models using 𝜏-Voice and ranked xAI’s Grok Voice as the leader.
Gemini 3.1 Flash Live Preview: Gemini 3.1 Flash Live Preview is Google’s low-latency audio-to-audio model optimized for real-time dialogue in voice-first AI applications, supporting multimodal inputs and acoustic nuances. Released in preview earlier in 2026, it enables conversational agents via API. It placed close behind leading models in the Artificial Analysis 𝜏-Voice benchmark for speech-to-speech agent performance.

Benchmark Focus: 𝜏-Voice tests voice agents on multi-turn instruction following, tool use, and complete customer interactions in airline, retail, and telecom domains under realistic audio conditions.
Recent Launches: xAI released Grok Voice Think Fast 1.0 in late April 2026 as its most capable voice agent for API deployment, while OpenAI introduced GPT-Realtime-2 enhancements in early May.
Voice Challenges: Speech-to-speech models face complexities from accents, background noise, packet loss, and the need for fast, consistent responses in long conversations compared to text agents.

XAI’s Grok Voice ranks first in agentic performance benchmark

XAI’s Grok Voice ranks first in agentic performance benchmark