Configure the Voice Agent

Learn about the voice agent configuration options for the agent, and both input and output audio.

To configure the voice agent, you'll need to send a Settings Configuration message immediately after connection. Below is a detailed explanation of the configurations available.

Configuration Parameters

ParameterDescription
audio.inputThe speech-to-text audio media input you wish to send. See options
audio.outputThe text-to-speech audio media output you wish to receive. See options
agent.listen.modelThe Deepgram speech-to-text model to be used. See options
agent.think.modelDefines the LLM Model to be used. See options
agent.think.provider.typeDefines the LLM Provider See options
agent.think.instructionsDefines the System Prompt for the LLM
agent.think.functionsPass functions to the agent that will be called throughout the conversation if/when needed as described per function. See options
agent.speak.modelThe Deepgram text-to-speech model to be used. See options
context.messagesUsed to restore existing conversation if websocket connection break.
context.replyUsed to to replay the last message, if it is an assistant message.

Full Example

Below is an in-depth example showing all the fields available fields for SettingConfigurations highlighting default values and optional properties.

{
  "type": "SettingsConfiguration",
  "audio": {
    "input": { // optional. defaults to 16kHz linear16
      "encoding": "",
      "sample_rate": 24000 // defaults to 24k 
    },
    "output": { // optional. see table below for defaults and allowed combinations
      "encoding": "",
      "sample_rate": 24000, // defaults to 24k
      "bitrate": 0,
      "container": ""
    }
  },
  "agent": {
    "listen": {
      "model": "" // optional. default 'nova-2'
    },
    "think": {
      "provider": {
        "type": "" // see `LLM providers and models` table below
      },
      "model": "", // see `LLM providers and models` table below
      "instructions": "", // optional (this is the LLM System prompt)
      "functions": [
        {
          "name": "", // function name
          "description": "", // tells the agent what the function does, and how and when to use it
          "url": "", // the endpoint where your function will be called
          "headers": [ // optional. Http headers to pass when calling the function. Only supports 'authorization'
            {
              "key": "authorization",
              "value": ""
            }
          ],
          "method": "post", // the http method to use when calling the function
          "parameters": {
            "type": "object",
            "properties": {
              "item": { // the name of the input property
                "type": "string", // the type of the input
                "description":"" // the description of the input so the agent understands what it is
              }
            },
            "required": ["item"] // the list of required input properties for this function to be called
          }
        }
      ]
    },
    "speak": {
      "model": "" // default 'aura-asteria-en' for other providers see TTS Models documentation
    }
  },
  "context": {
    "messages": [], // LLM message history (e.g. to restore existing conversation if websocket connection breaks)
    "replay": false // whether to replay the last message, if it is an assistant message
  }
}