Google announces launch of Gemini Omni to enhance AI capabilities

Google has unveiled its new AI model, “Gemini Omni,” marking a major step in expanding Gemini’s capabilities from content understanding and analysis to full-scale video generation using multimodal inputs, including text, images, audio, and video. The new model represents an advanced phase in the generative AI race, allowing users to create and edit videos through…

Google launches Gemini Omni to expand Gemini’s AI-powered content generation capabilities
The new model enables users to create fully integrated videos using multiple input formats, including text, images, audio, and video.
“Gemini Omni” combines Gemini’s reasoning capabilities with advanced visual generation, going beyond realistic visuals to deliver deeper contextual and motion understanding.

Google has unveiled its new AI model, “Gemini Omni,” marking a major step in expanding Gemini’s capabilities from content understanding and analysis to full-scale video generation using multimodal inputs, including text, images, audio, and video.

The new model represents an advanced phase in the generative AI race, allowing users to create and edit videos through natural conversation without relying on traditional editing software, while maintaining consistency in characters, scenes, motion, and visual dynamics throughout the content.

“Gemini Omni” builds on Google’s earlier generative AI developments, including the “Nano Banana” model focused on image creation and editing.

However, the company is now extending its AI capabilities into video production and editing through conversational commands, combined with deeper contextual understanding of movement, physics, and real-world environments.

Google says users will be able to refine and modify videos progressively through dialogue with the model, which can remember previous edits and reconstruct scenes while preserving visual details and stylistic consistency, transforming video production into a continuous interactive experience.

The model also enables users to transform original videos into entirely new scenes by adding characters, visual effects, or altering camera movements and cinematic styles, effectively turning video into a “continuously reproducible environment” rather than a static media file.

Gemini Omni combines Gemini’s reasoning and knowledge capabilities with advanced visual generation, enabling not only realistic-looking scenes but also a deeper understanding of gravity, motion, energy, and cultural or scientific context. Google aims to use these capabilities to create more coherent and logically structured content, particularly for educational, cinematic, and complex visual storytelling applications.

The model supports video generation using any combination of inputs, including text prompts, images, video clips, and audio, with future plans to expand support for more advanced audio generation capabilities.

Google also introduced a new “Avatars” feature, allowing users to create digital versions of themselves using their own voices and images to generate personalized AI-powered videos that mimic their appearance and speaking style.

The company has already begun rolling out the first version of the new series, called “Gemini Omni Flash,” through the Gemini app and YouTube Shorts, as competition intensifies among AI companies developing generative video technologies for media, advertising, entertainment, education, and content creation.

The launch of Gemini Omni reflects a broader shift in the AI industry, where models are evolving from intelligent assistants into fully integrated production platforms capable of generating sophisticated visual content entirely through conversation.

Read the article in Arabic