Configure the Voice Agent
Learn about the voice agent configuration options for the agent, and both input and output audio.
Voice Agent
Built withYou will need to migrate to the new V1 Agent API to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.
To configure your Voice Agent, you’ll need to send a Settings message immediately after connection. This message configures the agent’s behavior, input/output audio formats, and various provider settings.
For more information on the Settings
message, see the Voice Agent API Reference
Settings Overview
The Settings
message is a JSON object that contains the following fields:
Settings
Parameter | Type | Description |
---|---|---|
type | String | Must be “Settings” to indicate this is a settings configuration message |
experimental | Boolean | Enables experimental features. Defaults to false |
Audio
Parameter | Type | Description |
---|---|---|
audio.input | Object | The speech-to-text audio media input configuration |
audio.input.encoding | String | The encoding format for the input audio. Defaults to linear16 |
audio.input.sample_rate | Integer | The sample rate in Hz for the input audio. Defaults to 16000 |
audio.output | Object | The text-to-speech audio media output configuration |
audio.output.encoding | String | The encoding format for the output audio |
audio.output.sample_rate | Integer | The sample rate in Hz for the output audio |
audio.output.bitrate | Integer | The bitrate in bits per second for the output audio |
audio.output.container | String | The container format for the output audio. Defaults to none |
Agent
Parameter | Type | Description |
---|---|---|
agent.language | String | The language code for the agent. Defaults to en |
agent.listen.provider.type | Object | The speech-to-text provider type. Currently only Deepgram is supported |
agent.listen.provider.model | String | The Deepgram speech-to-text model to be used |
agent.listen.provider.keyterms | Array | The Keyterms you want increased recognition for |
agent.think.provider.type | Object | The LLM Model provider type e.g., open_ai , anthropic , x_ai , amazon_bedrock |
agent.think.provider.model | String | The LLM model to use |
agent.think.provider.temperature | Number | Controls the randomness of the LLM’s output. Range: 0-2 for OpenAI, 0-1 for Anthropic |
agent.think.endpoint | Object | Optional if LLM provider is Deepgram. Required for non-Deepgram LLM providers. When present, must include url field and headers object |
agent.think.functions | Array | Array of functions the agent can call during the conversation |
agent.think.functions.endpoint | Object | The Function endpoint to call. if not passed, function is called client-side |
agent.think.prompt | String | The system prompt that defines the agent’s behavior and personality |
agent.speak.provider.type | Object | The TTS Model provider type. e.g., deepgram , eleven_labs , cartesia , open_ai |
agent.speak.provider.model | String | The TTS Model to use for Deepgram or OpenAI |
agent.speak.provider.model_id | String | The TTS Model ID to use for Eleven Labs or Cartesia |
agent.speak.provider.voice | Object | Voice configuration for Cartesia provider. Requires model and id |
agent.speak.provider.language | String | Optional language setting for Cartesia provider |
agent.speak.provider.language_code | String | Optional language code for Eleven Labs provider |
agent.speak.provider.endpoint | Object | Optional if TTS provider is Deepgram. Required for non-Deepgram TTS providers. When present, must include url field and headers object |
agent.greeting | String | Optional initial message that the agent will speak when the conversation starts |
Full Example
Below is an in-depth example showing all the available fields for Settings
.
JSON
1 { 2 "type": "Settings", 3 "experimental": false, 4 "mip_opt_out": false, 5 "audio": { 6 "input": { 7 "encoding": "linear16", 8 "sample_rate": 24000 9 }, 10 "output": { 11 "encoding": "mp3", 12 "sample_rate": 24000, 13 "bitrate": 48000, 14 "container": "none" 15 } 16 }, 17 "agent": { 18 "language": "en", 19 "listen": { 20 "provider": { 21 "type": "deepgram", 22 "model": "nova-3", 23 "keyterms": ["hello", "goodbye"] 24 } 25 }, 26 "think": { 27 "provider": { 28 "type": "open_ai", 29 "model": "gpt-4o-mini", 30 "temperature": 0.7 31 }, 32 "endpoint": { // Optional for non-Deepgram LLM providers. When present, must include url field and headers object 33 "url": "https://api.example.com/llm", 34 "headers": { 35 "authorization": "Bearer {{token}}" 36 } 37 }, 38 "prompt": "You are a helpful AI assistant focused on customer service.", 39 "functions": [ 40 { 41 "name": "check_order_status", 42 "description": "Check the status of a customer order", 43 "parameters": { 44 "type": "object", 45 "properties": { 46 "order_id": { 47 "type": "string", 48 "description": "The order ID to check" 49 } 50 }, 51 "required": ["order_id"] 52 }, 53 "endpoint": { // If not provided, function is called client-side 54 "url": "https://api.example.com/orders/status", 55 "method": "post", 56 "headers": { 57 "authorization": "Bearer {{token}}" 58 } 59 } 60 } 61 ] 62 }, 63 "speak": { 64 "provider": { 65 "type": "deepgram", 66 "model": "aura-2-thalia-en", // Optional if TTS provider is Deepgram. Use for OpenAI OR Deepgram 67 "model_id": "1234567890", // Optional if TTS provider is Deepgram. Use for Eleven Labs OR Cartesia 68 "voice": { 69 "mode": "Cartesia mode type", // Optional if TTS provider is Deepgram. Use for Cartesia 70 "id": "Cartesia voice id" // Optional if TTS provider is Deepgram. Use for Cartesia 71 }, 72 "language": "en", // Optional if TTS provider is Deepgram. Use for Cartesia 73 "language_code": "en-US" // Optional if TTS provider is Deepgram. Use for Eleven Labs 74 }, 75 "endpoint": { 76 "url": "https://api.example.com/tts", 77 "headers": { 78 "authorization": "Bearer {{token}}" 79 } 80 } 81 }, 82 "greeting": "Hello! How can I help you today?" 83 } 84 }