Voice Agent Message Flow | Deepgram's Docs

This guide walks you through implementing the correct message flow when building a Voice Agent client. Follow these steps to establish a connection, configure settings, and handle the conversation loop.

Establish the Connection and Receive Welcome

Open a WebSocket connection to the Voice Agent endpoint.
Wait for the server to send a Welcome message confirming the connection:

1 { "type": "Welcome", "request_id": "uuid" }

Do not send any messages until you receive the Welcome message.

Configure Settings and Wait for Confirmation

Send a Settings message with your audio and agent configuration:

1 {
2   "type": "Settings",
3   "audio": {
4     "input": {
5       "encoding": "linear16",
6       "sample_rate": 16000
7     },
8     "output": {
9       "encoding": "linear16",
10       "sample_rate": 24000,
11       "container": "none"
12     }
13   },
14   "agent": {
15     "listen": { "model": "nova-3" },
16     "think": {
17       "provider": { "type": "open_ai" },
18       "model": "gpt-4o-mini"
19     },
20     "speak": { "model": "aura-2-thalia-en" }
21   }
22 }

Wait for the server to send a SettingsApplied message:

1 { "type": "SettingsApplied" }

Do not send audio or inject messages until you receive SettingsApplied.

Stream Audio and Inject Text

After receiving SettingsApplied, begin streaming binary audio data (PCM) continuously to the server.
Optionally, send text input using InjectUserMessage:

1 { "type": "InjectUserMessage", "content": "Hello" }

Handle Server Events

Process the following events as the conversation progresses:

Event	Description
`UserStartedSpeaking`	User began talking. Stop any audio playback immediately to handle barge-in.
`ConversationText`	User’s speech has been transcribed.
`AgentThinking`	Agent is processing the user’s input.
`ConversationText`	Agent’s text response is available.
`[binary audio]`	Agent’s audio response. Play this through your audio output.
`AgentAudioDone`	Agent finished speaking.
`Error` / `Warning`	Issues occurred during processing.

Message Flow Diagram

Verify the Implementation

Confirm your implementation works correctly by checking:

You receive a Welcome message immediately after connecting.
You receive a SettingsApplied message after sending your Settings.
The agent responds with ConversationText and binary audio when you speak or inject text.
Audio playback stops when you receive UserStartedSpeaking (barge-in detection).

Next Steps

Configure the Voice Agent for detailed settings options.
Outputs: Server Events for detailed event documentation.