Configure the Voice Agent

Learn about the voice agent configuration options for the agent, and both input and output audio.

Voice Agent

To configure the voice agent, you’ll need to send a Settings Configuration message immediately after connection. Below is a detailed explanation of the configurations available.

Configuration Parameters

ParameterDescription
audio.inputThe speech-to-text audio media input you wish to send. The audio must be uncontainerized and have the following encodings:
- Linear16
- alaw
- mulaw
audio.outputThe text-to-speech audio media output you wish to receive. See options
agent.listen.modelThe Deepgram speech-to-text model to be used. See options
agent.listen.keytermsKeyterms you want increased recognition for. See more
agent.think.modelDefines the LLM Model to be used. See options
agent.think.provider.typeDefines the LLM Provider See options
agent.think.instructionsDefines the System Prompt for the LLM
agent.think.functionsPass functions to the agent that will be called throughout the conversation if/when needed as described per function. See options
agent.speak.modelThe Deepgram text-to-speech model to be used. See options
context.messagesUsed to restore existing conversation if websocket connection break.
context.replayUsed to to replay the last message, if it is an assistant message.

Full Example

Below is an in-depth example showing all the fields available fields for SettingConfigurations highlighting default values and optional properties.

JSON
1{
2 "type": "SettingsConfiguration",
3 "audio": {
4 "input": { // optional. defaults to 16kHz linear16
5 "encoding": "",
6 "sample_rate": 24000 // defaults to 24k
7 },
8 "output": { // optional. see table below for defaults and allowed combinations
9 "encoding": "",
10 "sample_rate": 24000, // defaults to 24k
11 "bitrate": 0,
12 "container": ""
13 }
14 },
15 "agent": {
16 "listen": {
17 "model": "", // optional. default 'nova-3'
18 "keyterms": [] // optional, only available on nova 3 models
19 },
20 "think": {
21 "provider": {
22 "type": "" // see `LLM providers and models` table below
23 },
24 "model": "", // see `LLM providers and models` table below
25 "instructions": "", // optional (this is the LLM System prompt)
26 "functions": [
27 {
28 "name": "", // function name
29 "description": "", // tells the agent what the function does, and how and when to use it
30 "url": "", // the endpoint where your function will be called
31 "headers": [ // optional. Http headers to pass when calling the function. Only supports 'authorization'
32 {
33 "key": "authorization",
34 "value": ""
35 }
36 ],
37 "method": "post", // the http method to use when calling the function
38 "parameters": {
39 "type": "object",
40 "properties": {
41 "item": { // the name of the input property
42 "type": "string", // the type of the input
43 "description":"" // the description of the input so the agent understands what it is
44 }
45 },
46 "required": ["item"] // the list of required input properties for this function to be called
47 }
48 }
49 ]
50 },
51 "speak": {
52 "model": "" // default 'aura-asteria-en' for other providers see TTS Models documentation
53 }
54 },
55 "context": {
56 "messages": [], // LLM message history (e.g. to restore existing conversation if websocket connection breaks)
57 "replay": false // whether to replay the last message, if it is an assistant message
58 }
59}
Built with