谷歌推出Gemini 3.1 Flash-Lite预览版，开发者可经Gemini API在AI Studio使用、企业可在Vertex AI接入，定价0.25美元/百万输入token与1.50美元/百万输出token，相比2.5 Flash首字节更快且输出提速，支持可调“思考”等级，面向高并发翻译、内容审核、界面生成与仿真等任务以更低成本实现实时规模化推理。

Get best-in-class intelligence for your highest-volume workloads.

## General summary

Gemini 3.1 Flash-Lite is now available in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI. Priced at $0.25/1M input tokens and $1.50/1M output tokens, it's cost-efficient and faster than 2.5 Flash. Use 3.1 Flash-Lite for tasks like translation content moderation generating user interfaces and creating simulations.

Summaries were generated by Google AI. Generative AI is experimental.

## Basic explainer

Google made a new AI model called Gemini 3.1 Flash-Lite. It's super fast and cheap to use, so more people can use it. This AI is good at things like translating languages and checking content. Some companies are already using it to solve tough problems because it's both smart and efficient.

Summaries were generated by Google AI. Generative AI is experimental.

#### Explore other styles:

![Gemini 3.1 Flash Lite logo](https://storage.googleapis.com/gweb-uniblog-publish-prod/images/gemini-3.1_flash_Lite_blog_keywor.width-200.format-webp.webp)

[🎧 Gemini 3.1 Flash-Lite: Built for intelligence at scale](https://storage.googleapis.com/gweb-uniblog-publish-prod/media/tts_audio_83503_umbriel_2026_03_03_20_54_28.wav)

This content is generated by Google AI. Generative AI is experimental

3:18 minutes

Today, we're introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model. Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier.

Starting today, 3.1 Flash-Lite is rolling out in preview to developers via the Gemini API in [Google AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemini-3.1-flash-lite-preview) and for enterprises via [Vertex AI](https://console.cloud.google.com/vertex-ai/studio/multimodal?mode=prompt&model=gemini-3.1-flash-lite-preview).

## Cost-efficiency without compromise

Priced at just $0.25/1M input tokens and $1.50/1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models. It outperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the [Artificial Analysis benchmark](https://artificialanalysis.ai/) while maintaining similar or better quality. This low latency is needed for high-frequency workflows, making it an ideal model for developers to build responsive, real-time experiences.

Gemini 3.1 Flash-Lite outperforms 2.5 Flash in speed and quality.

3.1 Flash-Lite achieves an impressive Elo score of 1432 on the [Arena.ai Leaderboard](http://arena.ai/leaderboard) and outperforms other models of similar tier across reasoning and multimodal understanding benchmarks, including 86.9% on GPQA Diamond and 76.8% on MMMU Pro–even surpassing larger Gemini models from prior generations like 2.5 Flash.

## Adaptive intelligence at scale for developers

Beyond its raw performance, Gemini 3.1 Flash-Lite comes standard with thinking levels in AI Studio and Vertex AI, giving developers the control and flexibility to select how much the model “thinks” for a task, which is critical for managing high-frequency workloads. 3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions.

[▶ category generation](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/CategoryGeneration_v4.mp4#t=0.001)

[▶ weather dashboard](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/WeatherDashboard_v5.mp4#t=0.001)

[▶ SaaS report](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/SaasReport_v3.mp4#t=0.001)

[▶ photo sorter demo](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/Photo_sorter_Demo_v2_1_small.mp4#t=0.001)

3.1 Flash-Lite can analyze and sort large numbers of content like images quickly.

Early-access developers on AI Studio and Vertex AI, and companies like Latitude, Cartwheel and Whering are already using 3.1 Flash-Lite to solve complex problems at scale. Early testers highlighted 3.1 Flash-Lite’s efficiency and reasoning capabilities, saying it can handle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence.

We look forward to seeing what you build with 3.1 Flash-Lite and the rest of the Gemini 3 series models.

今天，我们推出了 Gemini 3.1 Flash-Lite，速度最快且成本效益最高的 Gemini 3 系列模型。它专为高工作量的开发者提供高质量的服务。

从今天开始，3.1 Flash-Lite 将在 Google AI Studio 和 Vertex AI 中通过 Gemini API 提供预览版本给开发者，企业用户也可以通过 Vertex AI 使用。

## 成本效益高且不妥协

价格为 $0.25/1M 输入令牌和 $1.50/1M 输出令牌，3.1 Flash-Lite 提供了更高的性能和更低的成本。它比 2.5 Flash 快 2.5 倍，输出速度增加 45%，根据 [Artificial Analysis](https://artificialanalysis.ai/) 的基准测试。它的低延迟对于高频工作流程至关重要，使其成为开发者构建响应式、实时体验的理想模型。

Gemini 3.1 Flash-Lite 比 2.5 Flash 快且质量高。

3.1 Flash-Lite 在 [Arena.ai Leaderboard](http://arena.ai/leaderboard) 上获得了 1432 分 Elo 分数，并在推理和多模态理解基准测试中超过了其他同级别的模型，包括 86.9% 的 GPQA Diamond 和 76.8% 的 MMMU Pro–甚至超过了前几代的更大 Gemini 模型，如 2.5 Flash。

## 适应性智能能力在开发者中

除了其原始性能之外，Gemini 3.1 Flash-Lite 还带有 AI Studio 和 Vertex AI 中的思考层级，给开发者提供了控制和灵活性来选择模型如何“思考”一个任务，这对于管理高频工作量至关重要。3.1 Flash-Lite 可以处理高工作量的任务，如大规模翻译和内容审核，成本是关键。它也可以处理更复杂的工作量，需要更深入的推理，如生成用户界面和仪表板、创建模拟或遵循指令。

[▶ 类别生成](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/CategoryGeneration_v4.mp4#t=0.001)

[▶ 天气仪表板](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/WeatherDashboard_v5.mp4#t=0.001)

[▶ SaaS 报告](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/SaasReport_v3.mp4#t=0.001)

[▶ 图片分类器演示](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_videos/Photo_sorter_Demo_v2_1_small.mp4#t=0.001)

3.1 Flash-Lite 可以快速分析和分类大量内容，如图像。

早期访问的开发者和公司，如 Latitude、Cartwheel 和 Whering，已经开始使用 3.1 Flash-Lite 来解决复杂问题。早期测试者称赞 3.1 Flash-Lite 的效率和推理能力，说它可以处理复杂输入的精度与更高级别的模型相似，遵循指令并保持一致性。

我们期待看到您如何使用 3.1 Flash-Lite 和 Gemini 3 系列模型。

Gemini 3.1 Flash-Lite：为大规模智能而打造

内容

成本效益高且不妥协

适应性智能能力在开发者中

评论

摘要