Configure the Voice Agent

Learn about the voice agent configuration options for the agent, and both input and output audio.

Voice Agent

You will need to migrate to the new V1 Agent API to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.

To configure your Voice Agent, you’ll need to send a Settings message immediately after connection. This message configures the agent’s behavior, input/output audio formats, and various provider settings.

For more information on the Settings message, see the Voice Agent API Reference

Settings Overview

The Settings message is a JSON object that contains the following fields:

Settings

ParameterTypeDescription
typeStringMust be “Settings” to indicate this is a settings configuration message
experimentalBooleanEnables experimental features. Defaults to false

Audio

ParameterTypeDescription
audio.inputObjectThe speech-to-text audio media input configuration
audio.input.encodingStringThe encoding format for the input audio. Defaults to linear16
audio.input.sample_rateIntegerThe sample rate in Hz for the input audio. Defaults to 16000
audio.outputObjectThe text-to-speech audio media output configuration
audio.output.encodingStringThe encoding format for the output audio
audio.output.sample_rateIntegerThe sample rate in Hz for the output audio
audio.output.bitrateIntegerThe bitrate in bits per second for the output audio
audio.output.containerStringThe container format for the output audio. Defaults to none

Agent

ParameterTypeDescription
agent.languageStringThe language code for the agent. Defaults to en
agent.listen.provider.typeObjectThe speech-to-text provider type. Currently only Deepgram is supported
agent.listen.provider.modelStringThe Deepgram speech-to-text model to be used
agent.listen.provider.keytermsArrayThe Keyterms you want increased recognition for
agent.think.provider.typeObjectThe LLM Model provider type e.g., open_ai, anthropic, x_ai, amazon_bedrock
agent.think.provider.modelStringThe LLM model to use
agent.think.provider.temperatureNumberControls the randomness of the LLM’s output. Range: 0-2 for OpenAI, 0-1 for Anthropic
agent.think.endpointObjectOptional if LLM provider is Deepgram. Required for non-Deepgram LLM providers. When present, must include url field and headers object
agent.think.functionsArrayArray of functions the agent can call during the conversation
agent.think.functions.endpointObjectThe Function endpoint to call. if not passed, function is called client-side
agent.think.promptStringThe system prompt that defines the agent’s behavior and personality
agent.speak.provider.typeObjectThe TTS Model provider type. e.g., deepgram, eleven_labs, cartesia, open_ai
agent.speak.provider.modelStringThe TTS Model to use for Deepgram or OpenAI
agent.speak.provider.model_idStringThe TTS Model ID to use for Eleven Labs or Cartesia
agent.speak.provider.voiceObjectVoice configuration for Cartesia provider. Requires model and id
agent.speak.provider.languageStringOptional language setting for Cartesia provider
agent.speak.provider.language_codeStringOptional language code for Eleven Labs provider
agent.speak.provider.endpointObjectOptional if TTS provider is Deepgram. Required for non-Deepgram TTS providers. When present, must include url field and headers object
agent.greetingStringOptional initial message that the agent will speak when the conversation starts

Full Example

Below is an in-depth example showing all the available fields for Settings.

JSON
1{
2 "type": "Settings",
3 "experimental": false,
4 "mip_opt_out": false,
5 "audio": {
6 "input": {
7 "encoding": "linear16",
8 "sample_rate": 24000
9 },
10 "output": {
11 "encoding": "mp3",
12 "sample_rate": 24000,
13 "bitrate": 48000,
14 "container": "none"
15 }
16 },
17 "agent": {
18 "language": "en",
19 "listen": {
20 "provider": {
21 "type": "deepgram",
22 "model": "nova-3",
23 "keyterms": ["hello", "goodbye"]
24 }
25 },
26 "think": {
27 "provider": {
28 "type": "open_ai",
29 "model": "gpt-4o-mini",
30 "temperature": 0.7
31 },
32 "endpoint": { // Optional for non-Deepgram LLM providers. When present, must include url field and headers object
33 "url": "https://api.example.com/llm",
34 "headers": {
35 "authorization": "Bearer {{token}}"
36 }
37 },
38 "prompt": "You are a helpful AI assistant focused on customer service.",
39 "functions": [
40 {
41 "name": "check_order_status",
42 "description": "Check the status of a customer order",
43 "parameters": {
44 "type": "object",
45 "properties": {
46 "order_id": {
47 "type": "string",
48 "description": "The order ID to check"
49 }
50 },
51 "required": ["order_id"]
52 },
53 "endpoint": { // If not provided, function is called client-side
54 "url": "https://api.example.com/orders/status",
55 "method": "post",
56 "headers": {
57 "authorization": "Bearer {{token}}"
58 }
59 }
60 }
61 ]
62 },
63 "speak": {
64 "provider": {
65 "type": "deepgram",
66 "model": "aura-2-thalia-en", // Optional if TTS provider is Deepgram. Use for OpenAI OR Deepgram
67 "model_id": "1234567890", // Optional if TTS provider is Deepgram. Use for Eleven Labs OR Cartesia
68 "voice": {
69 "mode": "Cartesia mode type", // Optional if TTS provider is Deepgram. Use for Cartesia
70 "id": "Cartesia voice id" // Optional if TTS provider is Deepgram. Use for Cartesia
71 },
72 "language": "en", // Optional if TTS provider is Deepgram. Use for Cartesia
73 "language_code": "en-US" // Optional if TTS provider is Deepgram. Use for Eleven Labs
74 },
75 "endpoint": {
76 "url": "https://api.example.com/tts",
77 "headers": {
78 "authorization": "Bearer {{token}}"
79 }
80 }
81 },
82 "greeting": "Hello! How can I help you today?"
83 }
84}
Built with