In this guide, you’ll integrate Amazon Connect inbound calls to a Deepgram-powered voice agent using a Bot Media Gateway that streams call audio over WebSockets.
The Deepgram Voice Agent will:
In this architecture:
The Voice Agent API operates over a bidirectional WebSocket connection, allowing clients to continuously stream audio and receive responses in real time.
You will need:
Your gateway will:
Create a Contact Flow that routes callers to your AI agent.
Typical flow:
Use either:
This sends the caller to the telephony endpoint hosting your voice agent gateway.
The Bot Media Gateway bridges telephony audio and the Deepgram Voice Agent WebSocket.
Typical responsibilities:
Deployment options include:
Example architecture of the media gateway:
The Bot Media Gateway opens a WebSocket connection to the Deepgram Voice Agent endpoint.
Example endpoint:
For EU data processing, use wss://api.eu.deepgram.com/v1/agent/converse. See Regional Endpoints for details.
Once the connection opens:
The Welcome message confirms the WebSocket connection is established.
Before sending audio, configure the voice agent using a Settings message.
The Settings message initializes the agent and defines audio formats and behavior.
Example:
After sending settings, the server responds with:
This confirms configuration has been successfully loaded.
Once the agent is initialized, the gateway begins streaming audio frames.
The Voice Agent API expects raw binary audio frames sent over the WebSocket connection.
Example message type:
Deepgram processes the audio and emits conversation events as the interaction progresses.
During the conversation, the server sends real-time events describing the interaction.
Examples include: • UserStartedSpeaking • AgentThinking • AgentStartedSpeaking • ConversationText
These events help the client manage audio playback and conversational state.
Example event:
The Deepgram Voice Agent can call backend systems using function calling.
When the agent decides it needs external data, it sends a FunctionCallRequest.
Example:
Review our function calling docs for more details.
When the agent generates speech, the Voice Agent API streams synthesized audio back to the client.
Your gateway:
Because the WebSocket connection streams audio continuously, playback can begin immediately, reducing latency.
If the voice agent cannot resolve a request, it can transfer the caller back to Amazon Connect.
Typical escalation flow:
Context from the AI conversation can be stored in a CRM or ticketing system before the transfer.