Amazon Connect and Deepgram Voice Agent
Amazon Connect and Deepgram Voice Agent
In this guide, youāll integrate Amazon Connect inbound calls to a Deepgram-powered voice agent using a Bot Media Gateway that streams call audio over WebSockets.
The Deepgram Voice Agent will:
- Process speech in real time
- Manage the conversation
- Invoke backend APIs using function calling
- Generate spoken responses streamed back to the caller
Overview
In this architecture:
- A caller dials an Amazon Connect phone number
- The Contact Flow performs routing and initial prompts
- The call is transferred to an external bot endpoint
- A Bot Media Gateway streams call audio to the Deepgram Voice Agent API
- The Deepgram Voice Agent processes the conversation and calls backend APIs when needed
- Audio responses are streamed back to the caller
The Voice Agent API operates over a bidirectional WebSocket connection, allowing clients to continuously stream audio and receive responses in real time.
Reference Architecture
Before You Begin
You will need:
- An Amazon Connect instance
- A Deepgram API key
- A server capable of handling SIP, RTP, or WebRTC telephony
- A Bot Media Gateway service (Node.js, Python, or Go recommended)
Your gateway will:
- Terminate the phone call
- Open a WebSocket connection to Deepgram
- Stream audio between the call and the Voice Agent
Step 1 ā Configure Amazon Connect
Create a Contact Flow that routes callers to your AI agent.
Typical flow:
Use either:
- Transfer to phone number
- Quick Connect
This sends the caller to the telephony endpoint hosting your voice agent gateway.
Step 2 ā Build the Bot Media Gateway
The Bot Media Gateway bridges telephony audio and the Deepgram Voice Agent WebSocket.
Typical responsibilities:
- Accept incoming SIP or RTP streams
- Convert audio into the required format
- Forward audio frames to Deepgram
- Play synthesized audio responses back to the caller
Deployment options include:
- AWS ECS
- AWS Fargate
- Kubernetes
- Containerized microservice
Example architecture of the media gateway:
Step 3 ā Connect to the Voice Agent API
The Bot Media Gateway opens a WebSocket connection to the Deepgram Voice Agent endpoint.
Example endpoint:
Once the connection opens:
- Wait for the Welcome message
- Send a Settings message
- Begin streaming audio
The Welcome message confirms the WebSocket connection is established.
Step 4 ā Send Voice Agent Settings
Before sending audio, configure the voice agent using a Settings message.
The Settings message initializes the agent and defines audio formats and behavior.
Example:
After sending settings, the server responds with:
This confirms configuration has been successfully loaded.
Step 5 ā Stream Call Audio
Once the agent is initialized, the gateway begins streaming audio frames.
The Voice Agent API expects raw binary audio frames sent over the WebSocket connection.
Example message type:
Deepgram processes the audio and emits conversation events as the interaction progresses.
Step 6 ā Handle Voice Agent Events
During the conversation, the server sends real-time events describing the interaction.
Examples include: ⢠UserStartedSpeaking ⢠AgentThinking ⢠AgentStartedSpeaking ⢠ConversationText
These events help the client manage audio playback and conversational state.
Example event:
Step 7 ā Function Calling
The Deepgram Voice Agent can call backend systems using function calling.
When the agent decides it needs external data, it sends a FunctionCallRequest.
Example:
Review our function calling docs for more details.
Step 8 - Audio Playback
When the agent generates speech, the Voice Agent API streams synthesized audio back to the client.
Your gateway:
- Receives audio frames
- Buffers them
- Sends them to the caller
Because the WebSocket connection streams audio continuously, playback can begin immediately, reducing latency.
Step 9 - Escalate to a Human Agent
If the voice agent cannot resolve a request, it can transfer the caller back to Amazon Connect.
Typical escalation flow:
Context from the AI conversation can be stored in a CRM or ticketing system before the transfer.