Configure the Voice Agent

Learn about the voice agent configuration options for the agent, and both input and output audio.

You will need to migrate to the new Voice Agent API V1 to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.

To configure your Voice Agent, you’ll need to send a Settings message immediately after connection. This message configures the agent’s behavior, input/output audio formats, and various provider settings.

For more information on the Settings message, see the Voice Agent API Reference

Settings Overview

The Settings message is a JSON object that contains the following fields:

Settings

ParameterTypeDescription
typeStringMust be “Settings” to indicate this is a settings configuration message
experimentalBooleanEnables experimental features. Defaults to false
mip_opt_outBooleanOpts out of MIP (Multi-turn Interaction Protocol). Defaults to false
flags.historyBooleanDefaults to true. Set to false to disable function call history.

Audio

ParameterTypeDescription
audio.inputObjectThe speech-to-text audio media input configuration
audio.input.encodingStringThe encoding format for the input audio. Defaults to linear16
audio.input.sample_rateIntegerThe sample rate in Hz for the input audio. Defaults to 16000
audio.outputObjectThe text-to-speech audio media output configuration
audio.output.encodingStringThe encoding format for the output audio
audio.output.sample_rateIntegerThe sample rate in Hz for the output audio
audio.output.bitrateIntegerThe bitrate in bits per second for the output audio
audio.output.containerStringThe container format for the output audio. Defaults to none

Agent

ParameterTypeDescription
agent.languageStringThe language code for the agent. Defaults to en
agent.contextObjectOptional conversation context including history of messages and function calls
agent.context.messagesArrayArray of previous conversation messages and function calls to provide context to the agent
agent.listen.provider.typeObjectThe speech-to-text provider type. Currently only Deepgram is supported
agent.listen.provider.modelStringThe Deepgram speech-to-text model to be used
agent.listen.provider.keytermsArrayThe Keyterms you want increased recognition for
agent.think.provider.typeObjectThe LLM Model provider type e.g., open_ai, anthropic, x_ai
agent.think.provider.modelStringThe LLM model to use
agent.think.provider.temperatureNumberControls the randomness of the LLM’s output. Range: 0-2 for OpenAI, 0-1 for Anthropic
agent.think.endpointObjectOptional if LLM provider is Deepgram. Required for non-Deepgram LLM providers. When present, must include url field and headers object
agent.think.functionsArrayArray of functions the agent can call during the conversation
agent.think.functions.endpointObjectThe Function endpoint to call. if not passed, function is called client-side
agent.think.promptStringThe system prompt that defines the agent’s behavior and personality
agent.think.context_lengthInteger or StringSpecifies the number of characters retained in context between user messages, agent responses, and function calls. This setting is only configurable when a custom think endpoint is used. Use max for maximum context length.
agent.speak.provider.typeObjectThe TTS Model provider type. e.g., deepgram, eleven_labs, cartesia, open_ai, aws_polly
agent.speak.provider.modelStringThe TTS Model to use for Deepgram or OpenAI
agent.speak.provider.model_idStringThe TTS Model ID to use for Eleven Labs or Cartesia
agent.speak.provider.voiceObject or StringVoice configuration. For Cartesia: use object with mode and id. For OpenAI: use a string value.
agent.speak.provider.languageStringOptional language setting for Cartesia provider
agent.speak.provider.language_codeStringOptional language code for Eleven Labs provider
agent.speak.provider.engineStringOptional engine for AWS Polly provider
agent.speak.provider.credentialsObjectOptional credentials for AWS Polly provider. When present, must include type,region, access_key_id, secret_access_key and session_token if STS is used
agent.speak.endpointObjectOptional if TTS provider is Deepgram. Required for non-Deepgram TTS providers. When present, must include url field and headers object
agent.greetingStringOptional initial message that the agent will speak when the conversation starts

agent.language

  • Choose your language setting based on your use case:
    • If you know your input language, specify it directly for the best recognition accuracy.
    • If you expect multiple languages or are unsure, use multi for flexible language support.
    • Currently multi is only supported with Eleven Labs TTS.
    • Refer to our supported languages to ensure you’re using the correct model (Nova-2 or Nova-3) for your selected language.

agent.think.context_length

  • Using max will set the context length to the maximum allowed based on the LLM provider you use. If the total context exceeds the model’s maximum, truncation is handled by the LLM provider.
  • Increasing the context length may help preserve multi-turn conversation history, especially when verbose function calls inflate the total context.
  • All characters sent to the LLM count toward the context limit, including fully serialized JSON messages, function call arguments, and responses. System messages are excluded and managed separately via agent.think.prompt.
  • The default context length set by Deepgram is optimized for cost and latency. It is not recommended to change this setting unless there’s a clear need.

agent.context

  • The agent.context object allows you to provide conversation history to the agent when starting a new session. This is useful for continuing conversations or providing background context.
  • The agent.context.messages array contains conversation history entries, which can be either conversational messages or function calls.
  • Conversational messages have the format: {"type": "History", "role": "user" | "assistant", "content": "message text"}
  • Function call messages have the format: {"type": "History", "function_calls": [{"id": "unique_id", "name": "function_name", "client_side": true/false, "arguments": "json_string", "response": "response_text"}]}
  • Use this feature to maintain conversation continuity across sessions or to provide the agent with relevant background information.
  • To disable function call history, set settings.flags.history to false in the Settings message.

Full Example

Below is an in-depth example showing all the available fields for Settings with all the optional fields for individual provider specific settings.

JSON
1{
2 "type": "Settings",
3 "experimental": false,
4 "mip_opt_out": false,
5 "flags": {
6 "history": true
7 },
8 "audio": {
9 "input": {
10 "encoding": "linear16",
11 "sample_rate": 24000
12 },
13 "output": {
14 "encoding": "mp3",
15 "sample_rate": 24000,
16 "bitrate": 48000,
17 "container": "none"
18 }
19 },
20 "agent": {
21 "language": "en",
22 "context": {
23 "messages": [
24 {
25 "type": "History",
26 "role": "user",
27 "content": "What's my order status?"
28 },
29 {
30 "type": "History",
31 "function_calls": [
32 {
33 "id": "fc_12345678-90ab-cdef-1234-567890abcdef",
34 "name": "check_order_status",
35 "client_side": true,
36 "arguments": "{\"order_id\": \"ORD-123456\"}",
37 "response": "Order #123456 status: Shipped - Expected delivery date: 2024-03-15"
38 }
39 ]
40 },
41 {
42 "type": "History",
43 "role": "assistant",
44 "content": "Your order #123456 has been shipped and is expected to arrive on March 15th, 2024."
45 }
46 ]
47 },
48 "listen": {
49 "provider": {
50 "type": "deepgram",
51 "model": "nova-3",
52 "keyterms": ["hello", "goodbye"]
53 }
54 },
55 "think": {
56 "provider": {
57 "type": "open_ai",
58 "model": "gpt-4o-mini",
59 "temperature": 0.7
60 },
61 "endpoint": { // Optional for non-Deepgram LLM providers. When present, must include url field and headers object
62 "url": "https://api.example.com/llm",
63 "headers": {
64 "authorization": "Bearer {{token}}"
65 }
66 },
67 "prompt": "You are a helpful AI assistant focused on customer service.",
68 "context_length": 15000, // Optional and can only be used when a custom think endpoint is used. Use max for maximum context length
69 "functions": [
70 {
71 "name": "check_order_status",
72 "description": "Check the status of a customer order",
73 "parameters": {
74 "type": "object",
75 "properties": {
76 "order_id": {
77 "type": "string",
78 "description": "The order ID to check"
79 }
80 },
81 "required": ["order_id"]
82 },
83 "endpoint": { // If not provided, function is called client-side
84 "url": "https://api.example.com/orders/status",
85 "method": "post",
86 "headers": {
87 "authorization": "Bearer {{token}}"
88 }
89 }
90 }
91 ]
92 },
93 "speak": {
94 "provider": {
95 "type": "deepgram",
96 "model": "aura-2-thalia-en", // For Deepgram or OpenAI providers
97 "model_id": "1234567890", // For Eleven Labs or Cartesia providers
98 "voice": {
99 "mode": "id", // For Cartesia provider only
100 "id": "a167e0f3-df7e-4d52-a9c3-f949145efdab" // For Cartesia provider only
101 },
102 "language": "en", // For Cartesia provider only
103 "language_code": "en-US", // For Eleven Labs provider only
104 "engine": "standard", // For AWS Polly provider only
105 "credentials": { // For AWS Polly provider only
106 "type": "IAM", // For AWS Polly provider only. Must be "IAM" or "STS"
107 "region": "us-east-1",
108 "access_key_id": "{{access_key_id}}",
109 "secret_access_key": "{{secret_access_key}}",
110 "session_token": "{{session_token}}" // Required for STS only
111 }
112 },
113 "endpoint": {
114 "url": "https://api.example.com/tts",
115 "headers": {
116 "authorization": "Bearer {{token}}"
117 }
118 }
119 },
120 "greeting": "Hello! How can I help you today?"
121 }
122}