谷歌通过 Live API 预览发布 Gemini 3.1 Flash Live，面向开发者构建低延迟实时语音与视觉代理，重点提升噪声环境下任务完成率、复杂指令遵循、对话自然度和90多种语言支持，结论是其更适合生产级实时对话应用，并可即刻在 Gemini API 与 AI Studio 集成使用。

<div id="readability-page-1" class="page"><div id="jump-content" class="site-content" tabindex="-1">
            
    
    

    <article class="uni-article-wrapper">

    
    





    

    
      








<div class="article-meta__author-container" data-analytics-module="{
    &quot;module_name&quot;: &quot;Hero Menu&quot;,
    &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
  }">
  
  <div class="article-meta__author-wrapper">
      
      
        <p class="article-meta__abstract-text uni-body--large">
          Developers can now build low-latency voice experiences with the new Gemini 3.1 Flash Live model, now available in preview via the Live API.
        </p>
      
    </div>
  
  <div class="article-meta__container">
      
  
    <figure class="article-meta__author-photo">
        <picture>
            


    

    
        <source media="(max-resolution: 1.5dppx)" sizes="122px" srcset="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/thor_nyc_gemini_sq.max-122x92.format-webp.webp 122w">
    
        <source media="(min-resolution: 1.5dppx)" sizes="244px" srcset="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/thor_nyc_gemini_sq.max-244x184.format-webp.webp 244w">
    

    <img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/thor_nyc_gemini_sq.max-244x184.format-webp.webp" alt="thor_nyc_gemini_sq" sizes=" 122px,  244px" srcset="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/thor_nyc_gemini_sq.max-122x92.format-webp.webp 122w, https://storage.googleapis.com/gweb-uniblog-publish-prod/images/thor_nyc_gemini_sq.max-244x184.format-webp.webp 244w" data-target="image" loading="lazy">
    


        </picture>
    </figure>



<div class="article-meta__author-info">
  <p>Thor Schaeff</p>
  
    <p>
      Developer Relations Engineer, Google DeepMind
    </p>
  
  
</div>

    </div>
</div>

    

    
      










<div class="article-image-hero">
    <figure class="article-image--full-aspect article-module">
      <div class="aspect-ratio-image">
        <p><img alt="Build with Gemini 3.1 Flash Live" class="aspect-ratio-image__image" data-component="uni-progressive-image" fetchpriority="high" height="150px" src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/build_with_gemini-3.1-flash-live_.width-200.format-webp.webp" width="360px" data-component-initialized="true" srcset="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/build_with_gemini-3.1-flash-live_.width-800.format-webp.webp 800w, https://storage.googleapis.com/gweb-uniblog-publish-prod/images/build_with_gemini-3.1-flash-live.width-1200.format-webp.webp 1200w, https://storage.googleapis.com/gweb-uniblog-publish-prod/images/build_with_gemini-3.1-flash-live.width-1600.format-webp.webp 1600w, https://storage.googleapis.com/gweb-uniblog-publish-prod/images/build_with_gemini-3.1-flash-live.width-2200.format-webp.webp 2200w" sizes="(max-width: 1023px) 100vw,(min-width: 1024px and max-width: 1259) 80vw, 1046px">
        </p>
      </div>
      
    </figure>
  </div>






    

    
    <div class="uni-container article-container" data-reading-time="true" data-component="uni-article-body" data-component-initialized="true">

            
  
    



















<div class="audio-player-tts" data-component="uni-audio-player-tts" uni-l10n="{
       &quot;stop&quot;: &quot;Pause article audio description&quot;,
       &quot;play&quot;: &quot;Play article audio description&quot;,
       &quot;progress&quot;: &quot;Current audio progress minutes with seconds: [[progress]]&quot;,
       &quot;duration&quot;: &quot;Duration of the audio minutes with seconds: [[duration]]&quot;,
       &quot;settings&quot;: &quot;Click for settings&quot;,
       &quot;timeText&quot;: &quot;[[duration]] minutes&quot;
     }" data-analytics-module="{
      &quot;module_name&quot;: &quot;Audio TTS&quot;,
      &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
     }" data-tts-audios="[
      
        {&quot;voice_name&quot;: &quot;Umbriel&quot;,
        &quot;voice_source&quot;: &quot;https://storage.googleapis.com/gweb-uniblog-publish-prod/media/tts_audio_83960_umbriel_2026_03_26_16_38_05.wav&quot;,
        &quot;mimetype&quot;: &quot;audio/x-wav&quot;},
      
        {&quot;voice_name&quot;: &quot;Gacrux&quot;,
        &quot;voice_source&quot;: &quot;https://storage.googleapis.com/gweb-uniblog-publish-prod/media/tts_audio_83960_gacrux_2026_03_26_16_37_47.wav&quot;,
        &quot;mimetype&quot;: &quot;audio/x-wav&quot;}
      ]" data-component-initialized="true">
  <p><audio class="audio-player-tts__player" title="Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live">
      <source src="https://storage.googleapis.com/gweb-uniblog-publish-prod/media/tts_audio_83960_umbriel_2026_03_26_16_38_05.wav" type="audio/x-wav">
      </audio></p><p>Your browser does not support the audio element.</p>
  <p></p><div class="audio-player-tts__container" aria-label="">
        <p><span class="audio-player-tts__text-content--title">
          Listen to article
          <span class="audio-player-tts__disclaimer" tabindex="0" role="tooltip" aria-label="This content is generated by Google AI. Generative AI is experimental">
            </span></span></p><p>This content is generated by Google AI. Generative AI is experimental</p>
            <svg class="audio-player-tts__disclaimer--icon">
  <use xmlns:xlink="http://www.w3.org/1999/xlink" href="/static/blogv2/images/icons.svg?version=pr20260319-1641#ttf-info"></use>
</svg>

          
        <p></p><p>4:01 minutes</p>
      </div>
</div>

  





            
            


  
    <div class="module--text module--text__article" data-component="uni-article-paragraph" data-component-initialized="true" role="presentation" data-analytics-module="{
           &quot;module_name&quot;: &quot;Paragraph&quot;,
           &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
         }"><p data-block-key="ga9xn" class="drop-cap">Today, we’re launching <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live">Gemini 3.1 Flash Live</a> via the <a href="https://ai.google.dev/gemini-api/docs/live" rel="noopener" target="_blank">Gemini Live API</a> in Google AI Studio. Gemini 3.1 Flash Live helps enable developers to build real-time voice and vision agents that can not only process the world around them, but also respond at the speed of conversation.</p><p data-block-key="adkl9">This is a step change in latency, reliability and more natural-sounding dialogue, delivering the quality needed for the next generation of voice-first AI.</p><h2 data-block-key="o1ko">Experience enhanced latency, reliability and quality</h2><p data-block-key="fsvpt">For real-time interactions, every millisecond of latency strips away the natural flow of the conversation that users expect. The new model better understands tone, emphasis and intent, enabling agents with key improvements:</p><ul><li data-block-key="40l37"><b>Higher task completion rates in noisy, real-world environments:</b> We’ve significantly improved the model’s ability to trigger external tools and deliver information during live conversations. By better discerning relevant speech from environmental sounds like traffic or television, the model more effectively filters out background noise to remain reliable and responsive to instructions.</li><li data-block-key="3ct7e"><b>Better instruction-following:</b> Adherence to complex system instructions has been boosted significantly. Your agent will stay within its operational guardrails, even when conversations take unexpected turns.</li><li data-block-key="9oat4"><b>More natural and low-latency dialogue:</b> The latest model improves on latency and is even more effective at recognizing acoustic nuances like pitch and pace compared to 2.5 Flash Native Audio, making real-time conversations feel a lot more fluid and natural.</li><li data-block-key="ar941"><b>Multi-lingual capabilities:</b> The model supports more than 90 languages for real-time multi-modal conversations.</li></ul><h2 data-block-key="77qq9">See the Gemini Live API in action</h2><p data-block-key="e3al2">Developers are actively building voice agents that communicate with a natural flow and pace and take actions reliably with Gemini Flash Live models. Here are a few examples of real-world apps that use the model to power their conversational interactions:</p></div>
  

  
    











<section class="uni-component-spacing">
  <ui-carousel gap="24px" analytics-module-name="Media Carousel" analytics-section-header="Build real-time conversational agents with Gemini 3.1 Flash Live" loop="" active-index="0">
    
  <div class="ui-carousel" role="region" id="ui-carousel-2160810" data-index="0" aria-live="polite" aria-atomic="false" aria-roledescription="carousel" aria-label="carousel" data-analytics-module="{&quot;module_name&quot;:&quot;Media Carousel&quot;,&quot;section_header&quot;:&quot;Build real-time conversational agents with Gemini 3.1 Flash Live&quot;}"><ui-carousel-slide analytics-module-name="Media Carousel Slide" analytics-section-header="Build real-time conversational agents with Gemini 3.1 Flash Live" aspect-ratio="horizontal" text-alignment="center" headline="">
  

  
    
  

  
<div class=" ui-carousel-slide " data-analytics-module="{&quot;module_name&quot;:&quot;Media Carousel Slide&quot;,&quot;section_header&quot;:&quot;Build real-time conversational agents with Gemini 3.1 Flash Live&quot;}"><p data-block-key="sp7x4">Using the Gemini Live API, <a href="https://stitch.withgoogle.com/" rel="noopener" target="_blank">Stitch</a> now enables its users to vibe design with their voice. The agent can 'see' the canvas and selected screens and give design critiques, build variations and more.</p></div></ui-carousel-slide></div></ui-carousel>
</section>

  

  
    <div class="module--text module--text__article" data-component="uni-article-paragraph" data-component-initialized="true" role="presentation" data-analytics-module="{
           &quot;module_name&quot;: &quot;Paragraph&quot;,
           &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
         }"><h2 data-block-key="ga9xn">Build with an expanding ecosystem of integrations</h2><p data-block-key="bs6t5">The Live API is built for production environments, but real-world systems require handling of diverse inputs, from live video streams to on-demand phone calls.</p><p data-block-key="a7l59">For systems that require WebRTC scaling or global edge routing, we recommend exploring our partner integrations to streamline the development of real-time voice and video agents.</p></div>
  

  
    





























<uni-image-full-width alignment="full" alt-text="Compiled logos of Firebase AI Logic, LiveKit, Pipecat, software mansion, VisionAgents, and voximplant" external-image="" or-mp4-video-title="" or-mp4-video-url="" section-header="Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live" custom-class="image-full-width--constrained-width uni-component-spacing" autoplay="true">
  
  
    
  
</uni-image-full-width>


  

  
    <div class="module--text module--text__article" data-component="uni-article-paragraph" data-component-initialized="true" role="presentation" data-analytics-module="{
           &quot;module_name&quot;: &quot;Paragraph&quot;,
           &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
         }"><h2 data-block-key="fdsiu">Get started with the Live API</h2><p data-block-key="d4enr"><a href="https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview" rel="noopener" target="_blank">Gemini 3.1 Flash Live</a> is available starting today via the Gemini API and in Google AI Studio. Developers can use the Gemini <a href="https://ai.google.dev/gemini-api/docs/live" rel="noopener" target="_blank">Live API</a> to integrate the model into their application.</p></div>
  

  
    
  
    




  <uni-youtube-player-article index="6" thumbnail-alt="Thor from Google DeepMind walks through the Gemini Live API, showing how to build natural, human-like voice interactions powered by Gemini’s native audio model: speech-to-speech, no text in the middle, with emotional nuance, multilingual support, and real-time tool use." video-id="XV5bhkDpL7U" video-type="video">
  <div class="article-video-special h-c-page--mobile-full-bleed " data-component="uni-article-yt-player" data-page-title="" data-video-id="XV5bhkDpL7U" data-index-id="6" data-type="video" data-component-initialized="true" data-yt-video="module" data-analytics-module="{&quot;module_name&quot;:&quot;Youtube Video&quot;}"><p class="uni-article-video__embed-container hidden"><iframe class="uni-article-video__video" id="uni-article-yt-player-XV5bhkDpL7U-6" frameborder="0" allowfullscreen="" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" title="Building Voice Agents with Gemini 3" width="640" height="360" src="https://www.youtube-nocookie.com/embed/XV5bhkDpL7U?enablejsapi=1&amp;origin=https%3A%2F%2Fblog.google&amp;widgetid=1&amp;forigin=https%3A%2F%2Fblog.google%2Finnovation-and-ai%2Ftechnology%2Fdevelopers-tools%2Fbuild-with-gemini-3-1-flash-live%2F&amp;aoriginsup=1&amp;vf=6" data-gtm-yt-inspected-2097707_35="true" data-gtm-yt-inspected-2097707_36="true"></iframe></p></div></uni-youtube-player-article>











  


  

  
    <div class="module--text module--text__article" data-component="uni-article-paragraph" data-component-initialized="true" role="presentation" data-analytics-module="{
           &quot;module_name&quot;: &quot;Paragraph&quot;,
           &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
         }"><p data-block-key="ga9xn">Explore our developer documentation to learn how you can build real-time agents:</p><ul><li data-block-key="cs0ir">Gemini <a href="https://ai.google.dev/gemini-api/docs/live?example=mic-stream" rel="noopener" target="_blank">Live API documentation</a>: Explore features like multilingual support, tool use and function calling, session management (for managing long running conversations) and ephemeral tokens.</li><li data-block-key="crd6v">Gemini <a href="https://github.com/google-gemini/gemini-live-api-examples" rel="noopener" target="_blank">Live API examples</a>: Get inspiration for the kind of voice experiences you can build today with the model.</li><li data-block-key="cenri"><a href="https://github.com/google-gemini/gemini-skills/tree/main/skills/gemini-live-api-dev" rel="noopener" target="_blank">Gemini Live API Skill</a>: For coding agents to learn and build with the Live API.</li></ul><p data-block-key="57avb">Get started with the <a href="https://ai.google.dev/gemini-api/docs/live-api/get-started-sdk" rel="noopener" target="_blank">Google GenAI SDK</a>:<span class="tombstone"></span></p></div>
  

  
    










<uni-code-block data-analytics-module="{
    &quot;module_name&quot;: &quot;Code Block&quot;,
    &quot;section_header&quot;: &quot;Build real\u002Dtime conversational agents with Gemini 3.1 Flash Live&quot;
  }" class="uni-code-block" code="import asyncio
from google import genai

client = genai.Client(api_key=&quot;YOUR_API_KEY&quot;)

model = &quot;gemini-3.1-flash-live-preview&quot;
config = {&quot;response_modalities&quot;: [&quot;AUDIO&quot;]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print(&quot;Session started&quot;)
        # Send content...

if __name__ == &quot;__main__&quot;:
    asyncio.run(main())" description="Code Snippet" lang="py"></uni-code-block>

  


            
            

            
              




            
          </div>
  </article>
  





  

  


<div class="uni-related-articles-cards kw-speakable-hidden ga4-carousel" data-component="uni-related-articles" aria-roledescription="carousel" data-component-initialized="true" data-analytics-module="{
    &quot;module_name&quot;: &quot;Article Footer Related Stories&quot;,
    &quot;section_header&quot;: &quot;Related stories&quot;
  }">
        <h3 class="uni-related-articles-cards__title">
          <p>
            Related stories
          </p>
        </h3>
      </div></div></div>

Today, we’re launching [Gemini 3.1 Flash Live](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live) via the [Gemini Live API](https://ai.google.dev/gemini-api/docs/live) in Google AI Studio. Gemini 3.1 Flash Live helps enable developers to build real-time voice and vision agents that can not only process the world around them, but also respond at the speed of conversation.

This is a step change in latency, reliability and more natural-sounding dialogue, delivering the quality needed for the next generation of voice-first AI.

## Experience enhanced latency, reliability and quality

For real-time interactions, every millisecond of latency strips away the natural flow of the conversation that users expect. The new model better understands tone, emphasis and intent, enabling agents with key improvements:

- Higher task completion rates in noisy, real-world environments: We’ve significantly improved the model’s ability to trigger external tools and deliver information during live conversations. By better discerning relevant speech from environmental sounds like traffic or television, the model more effectively filters out background noise to remain reliable and responsive to instructions.
- Better instruction-following: Adherence to complex system instructions has been boosted significantly. Your agent will stay within its operational guardrails, even when conversations take unexpected turns.
- More natural and low-latency dialogue: The latest model improves on latency and is even more effective at recognizing acoustic nuances like pitch and pace compared to 2.5 Flash Native Audio, making real-time conversations feel a lot more fluid and natural.
- Multi-lingual capabilities: The model supports more than 90 languages for real-time multi-modal conversations.

## See the Gemini Live API in action

Developers are actively building voice agents that communicate with a natural flow and pace and take actions reliably with Gemini Flash Live models. Here are a few examples of real-world apps that use the model to power their conversational interactions:

Using the Gemini Live API, [Stitch](https://stitch.withgoogle.com/) now enables its users to vibe design with their voice. The agent can 'see' the canvas and selected screens and give design critiques, build variations and more.

## Build with an expanding ecosystem of integrations

The Live API is built for production environments, but real-world systems require handling of diverse inputs, from live video streams to on-demand phone calls.

For systems that require WebRTC scaling or global edge routing, we recommend exploring our partner integrations to streamline the development of real-time voice and video agents.

## Get started with the Live API

[Gemini 3.1 Flash Live](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview) is available starting today via the Gemini API and in Google AI Studio. Developers can use the Gemini [Live API](https://ai.google.dev/gemini-api/docs/live) to integrate the model into their application.

Explore our developer documentation to learn how you can build real-time agents:

- Gemini [Live API documentation](https://ai.google.dev/gemini-api/docs/live?example=mic-stream): Explore features like multilingual support, tool use and function calling, session management (for managing long running conversations) and ephemeral tokens.
- Gemini [Live API examples](https://github.com/google-gemini/gemini-live-api-examples): Get inspiration for the kind of voice experiences you can build today with the model.
- [Gemini Live API Skill](https://github.com/google-gemini/gemini-skills/tree/main/skills/gemini-live-api-dev): For coding agents to learn and build with the Live API.

Get started with the [Google GenAI SDK](https://ai.google.dev/gemini-api/docs/live-api/get-started-sdk):

今天，我们通过 Google AI Studio 中的 [Gemini Live API](https://ai.google.dev/gemini-api/docs/live) 推出了 [Gemini 3.1 Flash Live](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live)。Gemini 3.1 Flash Live 帮助开发者构建实时语音和视觉智能体，这些智能体不仅能够处理周围世界的信息，还能以对话级速度作出响应。

这在延迟、可靠性以及更自然的对话表现方面实现了跨越式提升，提供了下一代语音优先 AI 所需的质量水平。

## 体验更优的延迟、可靠性与质量

对于实时交互而言，哪怕只是几毫秒的延迟，也会破坏用户所期待的自然对话流。新模型对语气、重音和意图的理解更出色，使智能体在以下关键方面得到改进：

- 在嘈杂的真实环境中拥有更高的任务完成率：我们显著提升了模型在实时对话中触发外部工具并传递信息的能力。通过更准确地区分相关语音与交通声、电视声等环境噪音，模型能够更有效地过滤背景噪声，从而保持可靠性并及时响应指令。
- 更强的指令遵循能力：模型对复杂系统指令的遵循程度显著提升。即使对话出现意料之外的走向，你的智能体也能始终保持在其操作边界之内。
- 更自然、低延迟的对话：与 2.5 Flash Native Audio 相比，最新模型在延迟方面进一步优化，并且在识别音高、语速等声学细微差别上更为有效，让实时对话更加流畅自然。
- 多语言能力：该模型支持 90 多种语言的实时多模态对话。

## 查看 Gemini Live API 的实际应用

开发者正在积极构建语音智能体，借助 Gemini Flash Live 模型，这些智能体能够以自然的节奏与韵律进行交流，并可靠地执行操作。以下是一些使用该模型驱动对话交互的真实应用示例：

借助 Gemini Live API，[Stitch](https://stitch.withgoogle.com/) 现已支持用户通过语音进行氛围化设计（vibe design）。该智能体可以“看见”画布和已选屏幕，并提供设计点评、生成变体等。

## 借助不断扩展的集成生态进行构建

Live API 面向生产环境打造，但现实世界的系统需要处理多样化输入，从实时视频流到按需电话呼叫。

对于需要 WebRTC 扩展能力或全球边缘路由的系统，我们建议探索我们的合作伙伴集成方案，以简化实时语音和视频智能体的开发。

## 开始使用 Live API

[Gemini 3.1 Flash Live](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview) 自今日起已通过 Gemini API 和 Google AI Studio 提供。开发者可使用 Gemini [Live API](https://ai.google.dev/gemini-api/docs/live) 将该模型集成到自己的应用中。

查阅我们的开发者文档，了解如何构建实时智能体：

- Gemini [Live API 文档](https://ai.google.dev/gemini-api/docs/live?example=mic-stream)：探索多语言支持、工具使用与函数调用、会话管理（用于管理长时间运行的对话）以及临时令牌（ephemeral tokens）等功能。
- Gemini [Live API 示例](https://github.com/google-gemini/gemini-live-api-examples)：获取灵感，了解你现在就可以借助该模型构建哪些语音体验。
- [Gemini Live API Skill](https://github.com/google-gemini/gemini-skills/tree/main/skills/gemini-live-api-dev)：供编码智能体学习并基于 Live API 进行构建。

通过 [Google GenAI SDK](https://ai.google.dev/gemini-api/docs/live-api/get-started-sdk) 开始上手：

使用 Gemini 3.1 Flash Live 构建实时对话智能体

内容

体验更优的延迟、可靠性与质量

查看 Gemini Live API 的实际应用

借助不断扩展的集成生态进行构建

开始使用 Live API

评论

摘要