Gemini Omni enables video creation from text, audio, and images

Google DeepMind has introduced Gemini Omni, an advanced AI model designed to generate and edit high-quality videos from various inputs including text, audio, and images. This launch represents a significant advance in the development of multimodal Gemini models, which aim to consistently integrate reasoning across multiple formats. Gemini Omni Flash, the initial iteration of this model, will be integrated into consumer products like the Gemini app and YouTube Shorts, highlighting an ongoing industry trend toward foundation models that enhance video generation and editing capabilities.

Gemini Omni: Gemini Omni is Google DeepMind’s latest multimodal Gemini family model designed to take text, images, audio, and video as input and generate or edit high-quality video content. In this news, Gemini Omni is introduced as a conversational video creation and editing system that lets users mix different media types to produce coherent videos, showcased as a highlight announcement at Google I/O.
Demis Hassabis: Demis Hassabis is the cofounder and CEO of Google DeepMind, leading Google’s advanced AI research and product efforts across the Gemini model family. In this context, he appears as the key spokesperson explaining how users can interact with Gemini Omni using natural language and mixed media inputs to generate and edit videos.

Product: Google DeepMind describes Gemini Omni as a major step toward fully multimodal Gemini models that are capable of reasoning across text, images, audio, and video to produce consistent video output.
Industry_Trend: The launch of Gemini Omni continues a broader industry shift toward foundation models that unify understanding and generation across multiple modalities, with a particular focus on richer, more controllable video generation and editing.
Platform_Integration: Google is rolling out the first Gemini Omni model, Gemini Omni Flash, into consumer-facing products such as the Gemini app, YouTube Shorts, and creative tooling, positioning it as a core engine for video creation and remixing.