Google Gemini Live

Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components. This page covers integration using the Gemini Developer API, authenticated with a Gemini API key obtained from Google AI Studio.

info

Enabling MLLM automatically disables ASR, LLM, and TTS since the MLLM handles end-to-end voice processing directly.

Sample configuration

The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.

"mllm": {
  "enable": true,
  "api_key": "<GOOGLE_GEMINI_API_KEY>",
  "messages": [
    {
      "role": "user",
      "content": "<HISTORY_CONTENT>"
    }
  ],
  "params": {
    "model": "gemini-3.1-flash-live-preview",
    "instructions": "You are a friendly assistant.",
    "voice": "Charon",
    "affective_dialog": false,
    "proactive_audio": false,
    "transcribe_agent": true,
    "transcribe_user": true,
    "http_options": {
      "api_version": "v1beta"
    }
  },
  "turn_detection": {
    // see details below
  },  
  "input_modalities": [
    "audio"
  ],
  "output_modalities": [
    "audio"
  ],
  "greeting_message": "Hi, how can I assist you today?",
  "failure_message": "Sorry, I encountered an issue. Please try again.",
  "vendor": "gemini"
}

Turn detection

For a full list of turn_detection parameters, see mllm.turn_detection. The following examples show the supported configurations for Google Gemini Live. To set up turn detection, add a turn_detection block inside the mllm object when you Start a conversational AI agent.

Server VAD

_9"turn_detection": { _9 "mode": "server_vad", _9 "server_vad_config": { _9 "prefix_padding_ms": 800, _9 "silence_duration_ms": 640, _9 "start_of_speech_sensitivity": "START_SENSITIVITY_HIGH", _9 "end_of_speech_sensitivity": "END_SENSITIVITY_HIGH" _9 } _9}
Agora VAD

_9"turn_detection": { _9 "mode": "agora_vad", _9 "agora_vad_config": { _9 "interrupt_duration_ms": 160, _9 "prefix_padding_ms": 800, _9 "silence_duration_ms": 640, _9 "threshold": 0.5 _9 } _9}

Key parameters

mllmrequired

enable booleannullable

Enables the MLLM module. Replaces the deprecated advanced_features.enable_mllm.

api_key stringrequired

The Google Gemini API key used to authenticate requests. You can generate an API key in Google AI Studio.

messages array[object]nullable

An array of conversation history items passed to the model as context. Each item represents a single message in the conversation history.

Show propertiesHide properties

role stringrequired

The role of the message author. For example, user.

content stringrequired

The content of the message.

params objectrequired

Configuration object for the Gemini Live model.

Show propertiesHide properties

model stringrequired

The Gemini Live model identifier.

instructions stringnullable

System instructions that define the agent's behavior or tone.

voice stringnullable

The voice identifier for audio output. For example, Aoede, Puck, Charon, Kore, Fenrir, Leda, Orus, or Zephyr.

affective_dialog booleannullable

Whether to enable affective dialog, which allows the model to adapt its tone based on the user's emotional cues.

proactive_audio booleannullable

When enabled, the model may choose not to respond if the user's input does not require a reply, such as background speech or incomplete requests.

transcribe_agent booleannullable

Whether to transcribe the agent's speech in real time.

transcribe_user booleannullable

Whether to transcribe the user's speech in real time.

http_options objectnullable

HTTP request options for the Gemini Live API.

Show propertiesHide properties

api_version stringnullable

The API version to use. For example, v1beta.

turn_detection objectnullable

Turn detection configuration for the MLLM module.

info

When mllm.turn_detection is defined, the top-level turn_detection object has no effect.

Show propertiesHide properties

mode stringnullable

Possible values: agora_vad, server_vad, semantic_vad

agora_vad: Agora VAD-based detection.
server_vad: Vendor-side VAD-based detection.
semantic_vad: Semantic-based detection.

agora_vad_config objectnullable

Configuration for Agora VAD-based turn detection. Applicable when mode is agora_vad.

Show propertiesHide properties

interrupt_duration_ms integernullable

Minimum duration of speech in milliseconds required to trigger an interruption.

prefix_padding_ms integernullable

Duration of audio in milliseconds to include before the detected speech start.

silence_duration_ms integernullable

Duration of silence in milliseconds required to determine end of speech.

threshold numbernullable

VAD sensitivity threshold. A higher value reduces false positives.

server_vad_config objectnullable

Configuration for vendor-side VAD-based turn detection. Applicable when mode is server_vad. Parameters are passed through to the vendor.

Show propertiesHide properties

prefix_padding_ms integernullable

Duration of audio in milliseconds to include before the detected speech start.

silence_duration_ms integernullable

Duration of silence in milliseconds required to determine end of speech.

start_of_speech_sensitivity stringnullable

Possible values: START_SENSITIVITY_HIGH, START_SENSITIVITY_LOW

Sensitivity for start of speech detection.

end_of_speech_sensitivity stringnullable

Possible values: END_SENSITIVITY_HIGH, END_SENSITIVITY_LOW

Sensitivity for end of speech detection.

input_modalities array[string]nullable

Default: ["audio"]

Input modalities for the MLLM.

["audio"]: Audio-only input
["audio", "text"]: Accept both audio and text input

output_modalities array[string]nullable

Default: ["audio"]

Output modalities for the MLLM.

["audio"]: Audio-only output
["text", "audio"]: Combined text and audio output

greeting_message stringnullable

The message the agent speaks when a user joins the channel.

failure_message stringnullable

The message the agent speaks when an error occurs.

vendor stringrequired

The MLLM provider identifier. Set to "gemini" to use Google Gemini Live with the Gemini Developer API.

For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.

Sample configuration​

Turn detection​

Key parameters​

Sample configuration

Turn detection

Key parameters