Gemini 2.5 Pro just got Deep Think, a new hypothesis mode

2 months ago 8

Gemini 2.5, Google's most advanced model family, is getting upgrades across the board.

On Tuesday at Google I/O, the company's annual developer conference, Google announced Deep Think, an "enhanced" reasoning mode for Gemini 2.5 Pro that shows "incredible" skill at complex math and coding. The company also announced improvements to Pro as well as Gemini 2.5 Flash, the second of its "most intelligent" models. Here's what's new.

Gemini 2.5 Pro updates

Most notably, Google is adding an experimental new reasoning mode to 2.5 Pro called Deep Think. Google says Deep Think "uses new research techniques [that enable] the model to consider multiple hypotheses before responding."

In terms of benchmarks, Deep Think scored an 84% on multimodal reasoning test MMMU. The company added that Deep Think got an "impressive score" on the 2025 United States of America Mathematical Olympiad (USAMO), a very challenging math benchmark, but did not include an exact score.

Also: 8 best AI features and tools revealed at Google I/O 2025

"We're taking extra time to conduct more frontier safety evaluations and get further input from safety experts," Google said in the announcement. "As part of that, we're going to make it available to trusted testers via the Gemini API to get their feedback before making it widely available."

In the coming weeks, Gemini 2.5 Pro will also get thinking budgets, which allow developers to control cost by toggling between latency and quality based on query. A feature 2.5 Flash launched with, thinking budgets help manage token usage efficiently.

The Pro updates come just two weeks after Google improved the model's coding ability. The model is currently in first place on the LMArena leaderboard, as well as on LMArena's more specific offshoot leaderboard, WebDev Arena. With a one-million-token context window, it leads in learning efficacy, according to Google, since the company enhanced 2.5 Pro with LearnLM -- a family of models developed in tandem with "educational experts."

Gemini 2.5 Flash updates

Google was less specific about the improvements it's made to its more budget model, but noted it's improved in reasoning, code, multimodal capability, and long context, all while becoming more efficient -- in company evaluations, the model used 20% to 30% fewer tokens.

The model card, published on April 29, now notes that 2.5 Flash scored 12.1% on Humanity's Last Exam (HLE), a relatively new benchmark test that aims to account for how quickly models have outpaced traditional industry-standard benchmarks, frequently scoring in the top percentile. This score is second only to OpenAI's o4-mini model, which hit 14.3%.

Also: With AI models clobbering every benchmark, it's time for human evaluation

Developers can access the updated 2.5 Flash in preview via Google AI Studio. Enterprise and all other users can try it out in Vertex AI and the Gemini app. This version of the model will be generally available for production in early June.

Audio updates

Starting today, the Live API is hosting a preview of "audio-visual input and native audio out dialogue," which Google says lets users create conversational experiences "with a more natural and expressive Gemini." The update allows users to specify Gemini's tone, accent, and delivery; for example, you could ask Gemini to tell a story more dramatically.

Also: OpenAI upgrades ChatGPT with Codex - and I'm seriously impressed (so far)

Other features available today include Affective Dialogue, which lets Gemini detect and respond to the emotion in a user's voice, and Proactive Audio, which helps Gemini ignore background noise.

Available now in the Gemini API, Google also released new text-to-speech previews for 2.5 Pro and 2.5 Flash that support multiple speakers at once in over 24 languages -- which it can switch seamlessly between -- and capture subtleties like whispers.

Safety and security

Google noted it has "significantly increased protections" against threats including indirect prompt injections, which are directives hidden in data like documents or images that are uploaded to a multimodal AI model.

Also: Multimodal AI poses new safety risks, creates CSEM and weapons info

The company says these improvements make the Gemini 2.5 model family its most secure to date.

Other updates

The Gemini API is also getting Project Mariner, Google's web-surfing AI agent, which will be rolled out to more developers this summer. Also in the API, Google added "native SDK support" for Model Context Protocol (MCP), Anthropic's universal AI agent standard, to make open-source tool integration easier. The company announced general support for MCP last month, and says it's exploring other ways to support agentic applications.

Also: You can win $250K from OpenAI if you help solve archaeological mysteries with AI

Gemini 2.5 can also use tools now, including search on a user's behalf. Google also noted it will introduce "thought summaries," otherwise known as chain of thought, in both the Gemini and Vertex AI APIs.

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

Read Entire Article