- Google’s Gemini AI assistant now supports audio file uploads.
- The AI will transcribe, summarize, and extract key information from recordings.
- The feature turns 10 minutes of voice memos, meetings, lectures, and interviews into searchable documents.
Google Gemini has just learned how to listen and make sense of what it hears. You can now upload audio files to the AI assistant on the web or through the mobile apps and get transcriptions, summaries, and key details.
For anyone who’s ever let a voice memo rot in their phone or dreaded the task of rewatching a meeting recording, this update could be the AI equivalent of hiring a personal note-taker.
That said, it can only handle 10 minutes of audio at a time, so no long meetings just yet. You can upload the audio files directly by selecting audio from the usual file upload options. What makes it different from Gemini’s earlier Gemini Live voice features is that this isn’t just speaking to the AI in real time.
Gemini Live is useful for casual commands, but this is more about getting the AI to process data as it does with the other formats. Notably, audio file uploading has apparently been the most requested feature from users, according to Google’s VP of Gemini Josh Woodward..
AI audio
✅ Papercut fixed: You can now upload any file to @GeminiApp. Including the #1 request: audio files are now supported! pic.twitter.com/4Te3xwLC6WSeptember 8, 2025
I tested it by uploading a couple of sketches from old comedy albums and a phone conversation with a friend. The AI successfully transcribed all the words said in each case, with a couple of small name-related errors. It was also good at pulling out key elements and things set for a to-do list.
The demand for audio and Google's response hint at how AI tools are evolving to match how we save information in audio logs and voice memos. Turning that into something searchable has usually meant using external transcription software. Gemini’s new feature collapses that process into a single step.
What makes the addition feel particularly timely is the way it dovetails with other recent Gemini improvements. Google has already integrated Gemini into apps like, begun testing a card-based visual interface, and significantly expanded Gemini’s personalization options. The ability to process audio continues that trend.
The audio option isn't unique to Gemini among AI assistants, but it can at least match some of what ChatGPT can do thanks to its Whisper transcription model. In fact, in my testing, I preferred Google's offering.
Anthropic’s Claude also handles audio in some developer tools, and Perplexity can extract data from YouTube videos. But Gemini’s execution is more focused on everyday use cases.
And the output isn’t just a dumb transcription. You can ask Gemini to simplify the language, extract speaker-specific comments, generate questions based on the content, or create a study guide from a classroom discussion. Of course, the 10-minute limit puts some restraint on making it part of everyday life. Free-tier users also face daily usage limits.
Google hasn’t released a formal pricing breakdown for high-volume audio processing, but it's part of the regular Gemini quota, so anyone planning to feed it a dozen hours of legal depositions should pace themselves.