Settings Configuration
Configure the voice agent and sets the input and output audio formats.
What is the SettingsConfiguration
Message
SettingsConfiguration
MessageThe SettingsConfiguration
message is a JSON command that serves as an initialization step, setting up both the behavior of the voice agent and the audio transmission formats before any actual voice data is exchanged over the websocket connection. This message configures the voice agent and sets the input and output audio formats.
Configuration Parameters
Parameter | Description |
---|---|
agent.listen.model | The Deepgram speech-to-text model to be used. See options |
agent.speak.model | The Deepgram text-to-speech model to be used. See options |
agent.think.model | Together with agent.provider.model , defines the LLM to be used. See options |
agent.think.functions | Pass functions to the agent that will be called throughout the conversation if/when needed as described per function. See options |
audio.input | The speech-to-text audio media input you wish to send. See options |
audio.output | The text-to-speech audio media output you wish to receive. See options |
context.messages | Used to restore existing conversation if websocket connection break. |
context.reply | Used to to replay the last message, if it is an assistant message. |
Sending SettingsConfiguration
SettingsConfiguration
To send the SettingsConfiguration
message, you need to send the following JSON message to the server:
The client should send a
SettingsConfiguration
message immediately after opening the websocket and before sending any audio.
{
"type": "SettingsConfiguration",
"audio": {
"input": { // optional. defaults to 16kHz linear16
"encoding": "",
"sample_rate": 24000 // defaults to 24k
},
"output": { // optional. see table below for defaults and allowed combinations
"encoding": "",
"sample_rate": 24000, // defaults to 24k
"bitrate": 0,
"container": ""
}
},
"agent": {
"listen": {
"model": "" // optional. default 'nova-2'
},
"think": {
"provider": {
"type": "" // see `LLM providers and models` table below
},
"model": "", // see `LLM providers and models` table below
"instructions": "", // optional (this is the LLM System prompt)
"functions": [
{
"name": "", // function name
"description": "", // tells the agent what the function does, and how and when to use it
"url": "", // the endpoint where your function will be called
"headers": [ // optional. Http headers to pass when calling the function. Only supports 'authorization'
{
"key": "authorization",
"value": ""
}
],
"method": "post", // the http method to use when calling the function
"parameters": {
"type": "object",
"properties": {
"item": { // the name of the input property
"type": "string", // the type of the input
"description":"" // the description of the input so the agent understands what it is
}
},
"required": ["item"] // the list of required input properties for this function to be called
}
}
]
},
"speak": {
"model": "" // default 'aura-asteria-en'
}
},
"context": {
"messages": [], // LLM message history (e.g. to restore existing conversation if websocket connection breaks)
"replay": false // whether to replay the last message, if it is an assistant message
}
}
SettingsConfiguration
Confirmation
SettingsConfiguration
ConfirmationUpon receiving the SettingsConfiguration
message, the server will process all remaining audio data and return the following:
{
"type": "SettingsApplied"
}
Conclusion
The SettingsConfiguration
message is an initialization command that establishes both the behavior of the voice agent and the audio transmission formats before voice data is exchanged. By configuring the agent and setting the input/output audio formats upfront, it ensures smooth and efficient communication over the WebSocket connection, laying the foundation for effective voice interactions.
Updated about 5 hours ago