For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
  • Get Started
    • Overview
    • Build a Voice Agent
    • Feature Overview
    • Template Apps
  • Configure
    • Overview
    • STT Models
    • LLM Models
    • TTS Models
    • Media Inputs & Outputs
    • Prompting Voice Agents
    • Multilingual Voice Agents
    • Maintaining Context
    • Reusable Agent Configurations
  • Build
    • Multi-Agent Architecture
  • Connect
  • Controls
  • Optimize
    • Voice Agent TTS Controls
    • Message Flow
    • Audio & Playback
    • Audio Preprocessing & Barge-In
    • Adaptive Echo Cancellation
  • Resources
    • SDKs
    • UI Components
    • API Reference
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Establish the Connection and Receive Welcome
  • Configure Settings and Wait for Confirmation
  • Stream Audio and Inject Text
  • Handle Server Events
  • Message Flow Diagram
  • Verify the Implementation
  • Next Steps
Optimize

Voice Agent Message Flow

Implement the correct WebSocket message sequence for Voice Agent conversations.
Was this page helpful?
Previous

Voice Agent Audio & Playback

Learn some strategies for dealing with audio and playback issues.
Next
Built with

This guide walks you through implementing the correct message flow when building a Voice Agent client. Follow these steps to establish a connection, configure settings, and handle the conversation loop.

Establish the Connection and Receive Welcome

  1. Open a WebSocket connection to the Voice Agent endpoint.

  2. Wait for the server to send a Welcome message confirming the connection:

1{ "type": "Welcome", "request_id": "uuid" }

Do not send any messages until you receive the Welcome message.

Configure Settings and Wait for Confirmation

  1. Send a Settings message with your audio and agent configuration:
1{
2 "type": "Settings",
3 "audio": {
4 "input": {
5 "encoding": "linear16",
6 "sample_rate": 16000
7 },
8 "output": {
9 "encoding": "linear16",
10 "sample_rate": 24000,
11 "container": "none"
12 }
13 },
14 "agent": {
15 "listen": { "model": "nova-3" },
16 "think": {
17 "provider": { "type": "open_ai" },
18 "model": "gpt-4o-mini"
19 },
20 "speak": { "model": "aura-2-thalia-en" }
21 }
22}
  1. Wait for the server to send a SettingsApplied message:
1{ "type": "SettingsApplied" }

Do not send audio or inject messages until you receive SettingsApplied.

Stream Audio and Inject Text

  1. After receiving SettingsApplied, begin streaming binary audio data (PCM) continuously to the server.

  2. Optionally, send text input using InjectUserMessage:

1{ "type": "InjectUserMessage", "content": "Hello" }

Handle Server Events

  1. Process the following events as the conversation progresses:
EventDescription
UserStartedSpeakingUser began talking. Stop any audio playback immediately to handle barge-in.
ConversationTextUser’s speech has been transcribed.
AgentThinkingAgent is processing the user’s input.
ConversationTextAgent’s text response is available.
[binary audio]Agent’s audio response. Play this through your audio output.
AgentAudioDoneAgent finished speaking.
Error / WarningIssues occurred during processing.

Message Flow Diagram

Verify the Implementation

Confirm your implementation works correctly by checking:

  • You receive a Welcome message immediately after connecting.
  • You receive a SettingsApplied message after sending your Settings.
  • The agent responds with ConversationText and binary audio when you speak or inject text.
  • Audio playback stops when you receive UserStartedSpeaking (barge-in detection).

Next Steps

  • Configure the Voice Agent for detailed settings options.
  • Outputs: Server Events for detailed event documentation.