Settings Configuration

Configure the voice agent and sets the input and output audio formats.

What is the SettingsConfiguration Message

The SettingsConfiguration message is a JSON command that serves as an initialization step, setting up both the behavior of the voice agent and the audio transmission formats before any actual voice data is exchanged over the websocket connection. This message configures the voice agent and sets the input and output audio formats.

Configuration Parameters

ParameterDescription
agent.listen.modelThe Deepgram speech-to-text model to be used. See options
agent.speak.modelThe Deepgram text-to-speech model to be used. See options
agent.think.modelDefines the LLM Model to be used. See options
agent.think.provider.typeDefines the LLM Provider See options
agent.think.instructionsDefines the System Prompt for the LLM
agent.think.functionsPass functions to the agent that will be called throughout the conversation if/when needed as described per function. See options
audio.inputThe speech-to-text audio media input you wish to send. See options
audio.outputThe text-to-speech audio media output you wish to receive. See options
context.messagesUsed to restore existing conversation if websocket connection break.
context.replyUsed to to replay the last message, if it is an assistant message.

Sending SettingsConfiguration

To send the SettingsConfiguration message, you need to send the following JSON message to the server:

📘

The client should send a SettingsConfiguration message immediately after opening the websocket and before sending any audio.

{
  "type": "SettingsConfiguration",
  "audio": {
    "input": { // optional. defaults to 16kHz linear16
      "encoding": "",
      "sample_rate": 24000 // defaults to 24k 
    },
    "output": { // optional. see table below for defaults and allowed combinations
      "encoding": "",
      "sample_rate": 24000, // defaults to 24k
      "bitrate": 0,
      "container": ""
    }
  },
  "agent": {
    "listen": {
      "model": "" // optional. default 'nova-2'
    },
    "think": {
      "provider": {
        "type": "" // see `LLM providers and models` table below
      },
      "model": "", // see `LLM providers and models` table below
      "instructions": "", // optional (this is the LLM System prompt)
      "functions": [
        {
          "name": "", // function name
          "description": "", // tells the agent what the function does, and how and when to use it
          "url": "", // the endpoint where your function will be called
          "headers": [ // optional. Http headers to pass when calling the function. Only supports 'authorization'
            {
              "key": "authorization",
              "value": ""
            }
          ],
          "method": "post", // the http method to use when calling the function
          "parameters": {
            "type": "object",
            "properties": {
              "item": { // the name of the input property
                "type": "string", // the type of the input
                "description":"" // the description of the input so the agent understands what it is
              }
            },
            "required": ["item"] // the list of required input properties for this function to be called
          }
        }
      ]
    },
    "speak": {
      "model": "" // default 'aura-asteria-en'
    }
  },
  "context": {
    "messages": [], // LLM message history (e.g. to restore existing conversation if websocket connection breaks)
    "replay": false // whether to replay the last message, if it is an assistant message
  }
}

SettingsConfiguration Confirmation

Upon receiving the SettingsConfiguration message, the server will process all remaining audio data and return the following:

{
    "type": "SettingsApplied"
}

Conclusion

The SettingsConfiguration message is an initialization command that establishes both the behavior of the voice agent and the audio transmission formats before voice data is exchanged. By configuring the agent and setting the input/output audio formats upfront, it ensures smooth and efficient communication over the WebSocket connection, laying the foundation for effective voice interactions.