Gemini 3.1 Flash TTS rolls out with new audio tags for enhanced speech control

Google has launched Gemini 3.1 Flash TTS, an advanced text-to-speech model that enhances controllability and expressivity in AI-generated audio. This new model includes Audio Tags, which allow users to command vocal style, delivery, and pace through text input, giving developers a higher degree of creative control. Gemini 3.1 Flash TTS supports over 70 languages, enabling localized and engaging experiences for a global audience. Additionally, all outputs include SynthID watermarking, which helps in the reliable detection of AI-generated content to prevent misinformation.

Google: Google develops the Gemini series of multimodal AI models, powering tools for developers and enterprises via platforms like Google AI Studio and Vertex AI. In recent weeks, Google has advanced its audio AI capabilities with releases like Gemini 3.1 Flash Live for real-time voice agents. The launch of Gemini 3.1 Flash TTS continues this momentum by enhancing expressive speech generation for global applications.
Gemini 3.1 Flash TTS: Gemini 3.1 Flash TTS is Google’s latest text-to-speech model emphasizing controllability, expressivity, and natural-sounding speech generation. It features audio tags that enable precise control over vocal style, pace, and delivery through natural language commands in text inputs. This model is now rolling out in preview for developers via the Gemini API and Google AI Studio, enterprises on Vertex AI, and general users through Google Vids.

`json
{
“Audio Tags”: “Introduces intuitive audio tags for directing vocal style, delivery, and pace with granular creative control.”,
“Safety Measures”: “Embeds SynthID watermarking in all outputs for reliable detection of AI-generated audio to combat misinformation.”,
“Multilingual Support”: “Delivers high-fidelity speech and precise accent control across more than 70 languages for localized experiences.”
}
`