
Gemini's Improved Text-to-Speech
Google showcased significant advancements in its Gemini AI's text-to-speech (TTS) capabilities at its recent I/O developer conference. The new feature, built upon native audio output, promises a more natural and expressive conversational experience.
Seamless Multilingual Conversation
A key highlight is the system's ability to seamlessly switch between over 24 languages using a single, consistent voice. Demonstrations showed the AI smoothly transitioning between English and Hindi, maintaining a remarkable level of vocal consistency that enhances the illusion of a single "speaker".
Beyond Words: Expressive Nuances
Google emphasizes the increased expressiveness and nuanced delivery of the new TTS. The AI voice sounds considerably less robotic, incorporating subtler inflections and tones to create a more engaging auditory experience. While the demonstration included a whisper mode, its implementation warrants further scrutiny given user feedback.
Accessibility and Availability
This enhanced TTS technology is now accessible via the Gemini API. Simultaneously, a preview of the Gemini Live API with native audio dialogue is available. These advancements highlight Google's continued commitment to improving its AI services, making them more powerful and user-friendly.
1 Image of AI Text to Speech:

Source: Engadget