
Gemini Omni Explained: Google's New 'Talk-to-it' Video AI, for Non-Technical Readers (May 2026)
Gemini Omni is Google's new 'talk-to-it' video AI, announced at I/O 2026 on 2026-05-19. This guide explains how it differs from Sora 2 and Veo 3, what's bundled into the new $100 Google AI Ultra plan, the real limits of SynthID watermarking, and how business teams should pair video AI with sales-outreach tools like Sales Claw — all written for non-technical readers.

中澤 圭志
@keishi_nakazawaSales Claw maintainer

Key Facts
Announced
2026-05-19 (Google I/O 2026 keynote by Sundar Pichai)
First model
Gemini Omni Flash (video output)
Entry points
Gemini app / Google Flow / YouTube Shorts Remix / YouTube Create App
Plans
Google AI Plus / Pro / Ultra (YouTube Shorts is free for users 18+)
"You keep hearing about Gemini Omni — but what actually is it? How is it different from Sora? And does that $100/month plan have anything to do with regular people?" — this article reads Google's official blog posts, the Gemini Apps release notes, and the Google AI subscriptions page as primary sources, and lays out what non-technical readers should understand before bringing this technology into their personal life or workplace. As of May 2026 the first available model is Gemini Omni Flash; image and audio output, plus a developer-facing API, are on the published roadmap. Getting the big picture now means you can react quickly when the heavier-duty releases land.
Primary sources for this article: Google Blog "Introducing Gemini Omni" (2026-05-19), Google Blog "100 things we announced at Google I/O 2026" (2026-05-20), Google Blog "Everything new in our Google AI subscriptions" (2026-05-20), the Gemini Apps Release Notes, and the official Gemini subscriptions page. For the wider Google I/O 2026 context and the relationship to Antigravity and Gemini Spark, read our Google I/O 2026 roundup alongside this piece. For the OpenAI-side comparison of image generation, see the gpt-image-2 practical guide. For the general "explainer for non-tech users" pattern applied to a different product, see our GitHub Copilot 2026 features & pricing explainer and the ChatGPT Atlas general-reader guide.
1. What Gemini Omni really is — "the video AI you can talk to"

Gemini Omni ("Omni" for short) is the new AI model family announced by Google and Google DeepMind on 2026-05-19. [Official] Google's official page positions Omni as a model that can "create anything from any input", with Gemini Omni Flash as the first member of the family going live that same day.
For ordinary users, the biggest difference shows up in the editing loop. Where previous video AIs felt like a vending machine — you put in a prompt, you take what comes out — Omni feels more like an assistant who remembers what you've been editing and lets you iterate by talking. Concretely:
- "Make the orange in the sunset a bit stronger" — adjusts the color without rebuilding the clip from scratch
- "Put the same character in a different scene" — preserves the character's look in a brand-new shot
- "Quieter background music" — touches only the audio without redoing the visuals
This "video AI you iterate on by talking to it"design is what makes Omni structurally different from prior video models. Earlier video AIs were "change the prompt, regenerate" tools. Omni is built as a generation tool that also edits, and Google describes that fusion as the fundamental philosophical difference between Omni and every other video model in the race.
[Author's view] The real significance of Omni is the shift from "video AI = prompt gacha" to "video AI = an assistant you talk to until you're happy". As of May 2026, single-shot output quality on third-party leaderboards still favors Seedance 2 and Sora 2 Pro, but the "don't throw it all away every time you tweak something" experience is what makes generative video usable for non-experts. That's what Omni gets right.
2. What it can and can't do today — input and output mapped

The four kinds of material you can feed in
[Official] Google's official blog says Omni accepts "a combination of images, audio, video and text" as input and returns high-resolution video and audio. That multimodal pipeline is the cleanest differentiator versus Sora 2 and Veo 3. Some example combinations:
| Example input combination | What kind of video you get |
|---|---|
| Text only: "a dog running through the sunset, cinematic" | ~10s high-resolution clip |
| One photo + "turn this dog into a running shot" | Clip that preserves the dog's look from the photo |
| An existing short clip + "refresh the BGM and add a logo at the end" | Same clip with new audio and a fresh ending |
| An audio track + "build a 10s opening that fits this music" | Opening clip synchronized to the music |
What it can't do yet
On the other side, the May 2026 Omni Flash still has some honest limitations. On the third-party Artificial Analysis Video Arena leaderboard, Seedance 2.0 ranked first in both text-to-video and image-to-video in May 2026, with Omni Flash, Veo 3.1, and Sora 2 sitting below it. [Third-party benchmark, not official]
- Long-form video: capped at roughly 10 seconds per clip (Google says this is a deployment choice, not a model constraint)
- Narrative video with very strict character consistency: Sora 2 still leads on cross-scene character fidelity
- Cinematic camera work: deliberate film-style pans, dollies, and slow lens control are still stronger on Veo 3
- Image output and audio output: on the roadmap, not yet shipping (output today is video only)
3. Omni vs. Sora 2 vs. Veo 3 vs. Seedance 2 — picking the right model
"Which one should I actually use?" is the question most regular people care about. Below is a use-case breakdown based on the May 2026 state of the Artificial Analysis Video Arena leaderboard and each vendor's official documentation.

| 項目 | Gemini Omni Flash | OpenAI Sora 2 |
|---|---|---|
| Biggest strength | Iterate on the same clip by talking to it | Character consistency and storytelling |
| Input modalities | text + image + audio + video | text + image (audio input is weak) |
| Max single-clip length | around 10s | up to 25s on Sora 2 Pro |
| Subscription gate | Google AI Plus / Pro / Ultra | ChatGPT Plus / Pro |
| Free entry point | YouTube Shorts Remix (18+) | Limited free tier |
| Watermarking | SynthID (DeepMind standard) | OpenAI proprietary watermark |
Use-case picker
- Crank out social / YouTube Shorts clips → Gemini Omni Flash (free via YouTube Shorts Remix)
- Keep the same character across multiple scenes → Sora 2 (best character consistency)
- Need cinematic camera work / film-style shots → Google Veo 3.1 (still alive alongside Omni, accessed via Google Flow)
- Want the highest single-shot quality, full stop → Seedance 2.0 (tops the third-party leaderboard, though distribution in Japan is limited)
- Animate product photos / iterate on an existing clip by talking → Gemini Omni Flash (image-to-video plus the conversational edit loop)
4. Pricing — the new $100 Google AI Ultra plan vs. ChatGPT and Claude

Concurrent with I/O 2026, Google overhauled its AI subscription lineup. [Official] The newly added "Google AI Ultra $100/month" tier is, in the words of Google VP Shimrit Ben-Yair, designed for "developers, technical leads, knowledge workers and advanced creators".

What you get for $100/month on AI Ultra
[Official] According to the Google blog, the $100/month AI Ultra plan includes:
- 5x higher usage limits than Pro across the Gemini app and Google Antigravity
- Access to Gemini Omni and Gemini 3.5 Flash
- Priority access to Google Antigravity (the new AI IDE)
- 20TB of cloud storage
- YouTube Premium individual plan bundled in
- Access to Gemini Spark (the new 24/7 AI agent)
How it compares to ChatGPT and Claude plans
| 項目 | Google AI Ultra ($100/mo) | ChatGPT Pro ($200/mo) |
|---|---|---|
| Video generation | Gemini Omni Flash + Veo 3.1 | Sora 2 + Sora 2 Pro |
| Image generation | Nano Banana | gpt-image-2 |
| Reasoning model | Gemini 3.5 Flash | GPT-5.5 Pro |
| Storage | 20TB | (separate, e.g. Google Drive) |
| Media perk | YouTube Premium individual | None |
| Coding integration | Priority access to Antigravity | Codex |
[Author's view] If you want video generation, YouTube Premium, and 20TB of cloud storage all in one bundle — or you're already deeply in the Google ecosystem — the $100 AI Ultra is simply a good deal. On the other hand, if you skip video and only care about high-quality text reasoning, the $20 plans on ChatGPT Plus or Claude Pro will be plenty for many people. The real deciding factor is how much video AI you'll actually use per month.
5. How to start — three entry points (Gemini app / Google Flow / YouTube Shorts)

Route 1: try it through the Gemini app
- Sign in to gemini.google.com or the Gemini app
- Subscribe to Google AI Plus, Pro, or Ultra (the free AI Free tier does not include Omni)
- In the chat box, type something like "Make a short video: a dog running through the sunset, cinematic"
- Wait 10-30 seconds for the clip; then iterate by chat ("a bit more orange in the sky, please")
Route 2: get serious in Google Flow
- Open Google Flow (available on Google AI Plus and up)
- Drop in source material (text instructions + photos + existing clips + audio tracks)
- Generate and edit scene by scene, conversationally
- Stitch multiple scenes together while preserving character consistency
Route 3: try the free YouTube Shorts Remix
- Open the Shorts section of the YouTube app
- Pick a Short you like, hit Remix, and ask Omni Flash to generate material (18+ only)
- The output is sized for short-form social
- The YouTube Create app (free starting this week) also supports it
6. Risks — fake video, copyright, and what SynthID does (and doesn't) cover
Fake-video risk and the limits of SynthID
[Official] Google says every Omni-generated video carries an invisible SynthID digital watermark, and that the watermark can be checked from the Gemini app or a dedicated tool to confirm whether a piece of footage is AI-generated.
That said, SynthID is "a way to detect AI-made content" — it is not "a way to prevent misuse". Specifically:
- The SynthID watermark can break under aggressive re-encoding
- It only matters if the receiving side actually checks it (most social platforms don't check automatically)
- Adversarial watermark-removal techniques can keep evolving faster than the defenders
Absolute don'ts, even for casual use
Additional considerations for business use
- Internal guidelines: document the bounds within which employees may use generative video AI
- Copyright review workflow: visually inspect outputs for accidental similarity to existing IP
- "AI generated" disclosure: when shipping to customers, disclose that the video is AI-generated
- Industry regulation: healthcare, finance, education, and public-sector use cases have additional rules — get legal involved
7. Business uses — the Sales Claw perspective
Realistic use cases
- Social marketing: weekly batches of 10-second clips for X / Instagram / Threads / TikTok
- Recruiting: AI-generated "life at the company" ambience clips (without using real employees' faces)
- Service marketing: hero clips for the landing page showing "what using the product feels like"
- Internal training: drop short "scenario" clips into otherwise text-only training materials
- Event teasers: short teasers for webinars and conferences
From the Sales Claw side
Sales Claw is open-source software designed to reduce mis-send and policy-violation risk through policy control, pre-send automated checks, sales-banned-content detection, CAPTCHA-stop behavior, send-rate limiting, audit logging, and automated kill-switch conditions. The relationship between video AI like Gemini Omni and Sales Claw is cleanest if you split them this way:
- Video AI = inbound material generation: produces the clips that live on your social and landing pages
- Sales Claw = outbound contact-form motion: sends carefully drafted messages through prospects' contact forms
- Attaching video to a B2B contact-form submission is typically counterproductive; keep them on different channels
- AI-generated content belongs on your LP / social / recruiting site; Sales Claw drafts and sends the messages that drive people to those surfaces
8. Wrapping up — meeting "the era of conversational video AI" well

Gemini Omni marks a meaningful shift in the video AI race: from "how good is the single shot?" to "how easily can you iterate on it?" Rather than replacing Sora 2 or Veo 3, it pushes the entire market into an era where you pick the right model for the job. That's the May 2026 status quo.
For ordinary users, the practical question is "how often will I actually use video AI?"If a couple of clips a month, the free YouTube Shorts Remix is enough. If you're generating multiple social clips a week, Google AI Plus or Pro becomes reasonable. If you're bringing video AI into business workflows, the new $100 AI Ultra plan is the realistic baseline. And if video isn't on your radar at all, the $20 ChatGPT Plus or Claude Pro tiers will keep covering text reasoning just fine.
Next steps: the lowest-risk way to feel out Omni is to make one clip from the free YouTube Shorts Remix. If you're bringing this into a business, do the legal / internal-policy work first. If you also want to automate the form-outreach side, you can start with a free download of Sales Claw. For related context, the Google I/O 2026 roundup is a useful companion piece.
Note: This is the English version. The Japanese original is available at /blog/2026-05-23-gemini-omni-video-ai-general-readers-guide.
よくある質問
What is Gemini Omni, in one paragraph?
How much does it cost? What is the new $100 Google AI Ultra plan?
How does Gemini Omni compare to Sora 2 or Veo 3?
What is the SynthID watermark? Does it matter for regular users?
Can I use Gemini Omni for commercial / business purposes? What should I watch out for?
Is Gemini Omni available in Japan? Any restrictions?
参考文献
本記事は X 公式アカウントと公式ドキュメントを一次情報として参照しています。
- [01]
- [02]
- [03]
- [04]
- [05]Gemini Apps Release Notes (Official)2026-05-22
- [06]
- [07]
- [08]Gemini Apps official entry point2026-05-22
この記事の著者

中澤 圭志
Sales Claw maintainer
Designs and develops Sales Claw. Writes from the field on B2B sales automation and applied AI.


