Google has introduced an AI reasoning control mechanism for its Gemini 2.5 Flash model that allows developers to limit how much processing power the system expends on problem-solving.
Released on April 17, this “thinking budget” feature responds to a growing industry challenge: advanced AI models frequently overanalyse straightforward queries, consuming unnecessary computational resources and driving up operational and environmental costs.
While not revolutionary, the development represents a practical step toward addressing efficiency concerns that have emerged as reasoning capabilities become standard in commercial AI software.
The new mechanism enables precise calibration of processing resources before generating responses, potentially changing how organisations manage financial and environmental impacts of AI deployment.
“The model overthinks,” acknowledges Tulsee Doshi, Director of Product Management at Gemini. “For simple prompts, the model does think more than it needs to.”
The admission reveals the challenge facing advanced reasoning models – the equivalent of using industrial machinery to crack a walnut.
The shift toward reasoning capabilities has created unintended consequences. Where traditional large language models primarily matched patterns from training data, newer iterations attempt to work through problems logically, step by step. While this approach yields better results for complex tasks, it introduces significant inefficiency when handling simpler queries.
Balancing cost and performance
The financial implications of unchecked AI reasoning are substantial. According to Google’s technical documentation, when full reasoning is activated, generating outputs becomes approximately six times more expensive than standard processing. The cost multiplier creates a powerful incentive for fine-tuned control.
Nathan Habib, an engineer at Hugging Face who studies reasoning models, describes the problem as endemic across the industry. “In the rush to show off smarter AI, companies are reaching for reasoning models like hammers even where there’s no nail in sight,” he explained to MIT Technology Review.
The waste isn’t merely theoretical. Habib demonstrated how a leading reasoning model, when attempting to solve an organic chemistry problem, became trapped in a recursive loop, repeating “Wait, but…” hundreds of times – essentially experiencing a computational breakdown and consuming processing resources.
Kate Olszewska, who evaluates Gemini models at DeepMind, confirmed Google’s systems sometimes experience similar issues, getting stuck in loops that drain computing power without improving response quality.