Imagine standing in a busy train station in Tokyo and hearing a guide speak in rapid Japanese — and, a few seconds later, your earbud whispers the same meaning back to you in English with the original speaker’s rhythm intact. That’s the promise behind Google’s newly announced Gemini 3.5 Live Translate: a speech-to-speech model built to translate continuously, in more than 70 languages, with low latency and a lifelike delivery.
A near real-time interpreter
Unlike older “turn-by-turn” systems that wait for a speaker to pause, Gemini 3.5 Live Translate processes audio continuously and balances the trade-off between context and speed. The system deliberately stays a few seconds behind the speaker so translations flow naturally without awkward halts — preserving intonation, pacing and pitch so the rendered voice sounds more human than robotic.
Google says the model auto-detects multilingual inputs, filters background noise, and can handle dozens of language combinations inside a single conversation. Behind the scenes, developers can access the capability via the public preview of the Gemini Live API and Google AI Studio, letting partners focus on user experience instead of the plumbing of real‑time audio streaming.
Where you’ll hear it first
Gemini 3.5 Live Translate is rolling out in stages:
- Developers: public preview via the Gemini Live API and Google AI Studio.
- Google Meet: private preview for select Workspace customers this month, with a wider rollout slated later in the year.
- Google Translate app: rolling out globally on Android and iOS for Live Translate.
If you’ve used Meet’s translation before, this is a big leap — the service previously supported only a handful of languages and typically translated to and from English. Gemini 3.5 raises the ceiling to 70+ languages and unlocks more than 2,000 language-pair combinations in a single meeting.
You won’t need Pixel Buds to use it. The Translate app supports any headphones, and Android users get an extra trick: a new “listening mode” that streams translated audio into the phone’s earpiece if you don’t have headphones handy. That makes quick, private translation possible simply by holding the phone to your ear, the way you would for a call. For more on how Google expanded Live Translate beyond Android, see this earlier writeup on the app’s wider rollout (/news/google-live-translate-ios-expansion).
Partners, demos and real-world tests
Google is already working with developer platforms and partners — Agora, LiveKit, Fishjam, Grab and media companies like CJ ENM — to integrate the model into calls, broadcasts and customer service flows. Grab, for example, is trialing the tech to let drivers and riders communicate across languages in near‑real time for millions of calls per month.
Early reactions from partners quoted by Google emphasize low latency and a step up in translation quality. Those demos, recorded in controlled conditions, sound promising; the real test will be unpredictable, noisy, cross-accented conversations in the wild.
Voices, watermarks and safety
One important safety detail: all audio generated by Gemini 3.5 Live Translate includes an imperceptible SynthID watermark embedded in the waveform. Google says the watermark is intended to help detection of AI‑generated audio and discourage misuse such as impersonation or misinformation. You can read the company’s explanation and model card for more on safety and design choices in the official blog post Gemini 3.5 Live Translate.
Privacy and regional availability remain complex as Google expands Gemini features globally; the company has recently rolled other Gemini capabilities outward with some location-based caveats, which is worth keeping in mind when new features appear in your region (/news/google-gemini-personal-intelligence-global-rollout).
What this feels like — and why it matters
Technically, this is an evolution rather than a revolution: speech translation has been a long-running Google project. Practically, the combination of continuous streaming, wide language coverage and low-latency output could change how people handle travel, customer service, international meetings and multilingual broadcasts. The “voice” you hear will sound more natural, and the experience aims to be more conversational than clunky.
There are still open questions: how well the model handles rare dialects, how transcription errors cascade into translated speech, and how smoothly Meet and other apps will surface the feature for real users. Google has also hinted that different capacity builds — including higher-capacity “Pro” variants in its Gemini family — may arrive later, which could shape how the tech is offered to enterprises versus consumers.
If you want to try it or build with it, the easiest entry points are the Translate app and the Gemini Live API preview; enterprises can apply for the Meet preview. Either way, the near-real-time interpreter is moving from demo stages into tools you can use, and that shift tends to surface both delightful surprises and thorny edge cases quickly.




