Voice Agent Message Flow

Implement the correct WebSocket message sequence for Voice Agent conversations.

This guide walks you through implementing the correct message flow when building a Voice Agent client. Follow these steps to establish a connection, configure settings, and handle the conversation loop.

Establish the Connection and Receive Welcome

  1. Open a WebSocket connection to the Voice Agent endpoint.

  2. Wait for the server to send a Welcome message confirming the connection:

1{ "type": "Welcome", "request_id": "uuid" }

Do not send any messages until you receive the Welcome message.

Configure Settings and Wait for Confirmation

  1. Send a Settings message with your audio and agent configuration:
1{
2 "type": "Settings",
3 "audio": {
4 "input": {
5 "encoding": "linear16",
6 "sample_rate": 16000
7 },
8 "output": {
9 "encoding": "linear16",
10 "sample_rate": 24000,
11 "container": "none"
12 }
13 },
14 "agent": {
15 "listen": { "model": "nova-3" },
16 "think": {
17 "provider": { "type": "open_ai" },
18 "model": "gpt-4o-mini"
19 },
20 "speak": { "model": "aura-2-thalia-en" }
21 }
22}
  1. Wait for the server to send a SettingsApplied message:
1{ "type": "SettingsApplied" }

Do not send audio or inject messages until you receive SettingsApplied.

Stream Audio and Inject Text

  1. After receiving SettingsApplied, begin streaming binary audio data (PCM) continuously to the server.

  2. Optionally, send text input using InjectUserMessage:

1{ "type": "InjectUserMessage", "content": "Hello" }

Handle Server Events

  1. Process the following events as the conversation progresses:
EventDescription
UserStartedSpeakingUser began talking. Stop any audio playback immediately to handle barge-in.
ConversationTextUser’s speech has been transcribed.
AgentThinkingAgent is processing the user’s input.
ConversationTextAgent’s text response is available.
[binary audio]Agent’s audio response. Play this through your audio output.
AgentAudioDoneAgent finished speaking.
Error / WarningIssues occurred during processing.

Message Flow Diagram

Verify the Implementation

Confirm your implementation works correctly by checking:

  • You receive a Welcome message immediately after connecting.
  • You receive a SettingsApplied message after sending your Settings.
  • The agent responds with ConversationText and binary audio when you speak or inject text.
  • Audio playback stops when you receive UserStartedSpeaking (barge-in detection).

Next Steps