Settings Configuration

Send a SettingsConfiguration message to configure the voice agent and set the input and output audio formats.

What is the SettingsConfiguration Message

The SettingsConfiguration message is a JSON command that serves as an initialization step, setting up both the behavior of the voice agent and the audio transmission formats before any actual voice data is exchanged over the websocket connection. This message configures the voice agent and sets the input and output audio formats.

📘

For a detailed explanation of all the options available for the SettingsConfiguration message, see our documentation on how to Configure the Voice Agent.

Sending SettingsConfiguration

To send the SettingsConfiguration message, you need to send the following JSON message to the server:

🚧

The client should send a SettingsConfiguration message immediately after opening the websocket and before sending any audio.

Basic Example

This example uses a very basic SettingConfiguration to establish a connection.

{
  "type": "SettingsConfiguration",
  "audio": {
    "input": {
      "encoding": "linear16",
      "sample_rate": 24000
    },
    "output": {
      "encoding": "linear16",
      "sample_rate": 24000,
      "container": "none"
    }
  },
  "agent": {
    "listen": {
      "model": "nova-2"
    },
    "think": {
      "provider": {
        "type": "open_ai"
        // ... additional provider options available
      },
      "model": "gpt-4o-mini"
      // ... additional think options including instructions and functions available
    },
    "speak": {
      "model": "aura-asteria-en"
    }
  }
  // ... additional top-level options including context available
}

SettingsConfiguration Confirmation

Upon receiving the SettingsConfiguration message, the server will process all remaining audio data and return the following Settings Applied message.

{
    "type": "SettingsApplied"
}

Conclusion

The SettingsConfiguration message is an initialization command that establishes both the behavior of the voice agent and the audio transmission formats before voice data is exchanged. By configuring the agent and setting the input/output audio formats upfront, it ensures smooth and efficient communication over the WebSocket connection, laying the foundation for effective voice interactions.