Voice Agent Message Flow
This guide walks you through implementing the correct message flow when building a Voice Agent client. Follow these steps to establish a connection, configure settings, and handle the conversation loop.
Establish the Connection and Receive Welcome
-
Open a WebSocket connection to the Voice Agent endpoint.
-
Wait for the server to send a
Welcomemessage confirming the connection:
Do not send any messages until you receive the Welcome message.
Configure Settings and Wait for Confirmation
- Send a
Settingsmessage with your audio and agent configuration:
- Wait for the server to send a
SettingsAppliedmessage:
Do not send audio or inject messages until you receive SettingsApplied.
Stream Audio and Inject Text
-
After receiving
SettingsApplied, begin streaming binary audio data (PCM) continuously to the server. -
Optionally, send text input using
InjectUserMessage:
Handle Server Events
- Process the following events as the conversation progresses:
Message Flow Diagram
Verify the Implementation
Confirm your implementation works correctly by checking:
- You receive a
Welcomemessage immediately after connecting. - You receive a
SettingsAppliedmessage after sending yourSettings. - The agent responds with
ConversationTextand binary audio when you speak or inject text. - Audio playback stops when you receive
UserStartedSpeaking(barge-in detection).
Next Steps
- Configure the Voice Agent for detailed settings options.
- Outputs: Server Events for detailed event documentation.