Google previews Gemini 2.5 Flash-Lite

Google previews Gemini 2.5 Flash-Lite

Google has unveiled a preview of Gemini 2.5 Flash-Lite, a reasoning model optimized for cost and speed, and announced that two other Gemini models, Gemini 2.5 Pro and Gemini 2.5 Flash, are now generally available.

Google made the announcements June 17. Gemini 2.5 models are thinking models, capable of reasoning through thoughts before responding, resulting in enhanced performance and improved accuracy, Google said.

Gemini 2.5 Flash-Lite has the lowest cost and lowest latency in the Gemini 2.5 model family, Google said. Flash-Lite is a reasoning model that enables dynamic control of the thinking budget via an API parameter, but because Flash-Lite is optimized for low latency and low cost, thinking is turned off by default. This model is “great” for high throughput tasks such as classification or summarization at scale, Google said. Built as an upgrade to Gemini 1.5 Flash and 2.0 Flash models, Gemini 2.5 Flash-Lite offers better performance across most evals and lower time to the first token, while also achieving higher tokens per second decode, according to Google. Each Gemini 2.5 model has control over the thinking budget, giving developers the ability to choose when and how much the model thinks before generating a response.

0 Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like