Amazon Connect and Deepgram Voice Agent

Amazon Connect and Deepgram Voice Agent

In this guide, you’ll integrate Amazon Connect inbound calls to a Deepgram-powered voice agent using a Bot Media Gateway that streams call audio over WebSockets.

The Deepgram Voice Agent will:

  • Process speech in real time
  • Manage the conversation
  • Invoke backend APIs using function calling
  • Generate spoken responses streamed back to the caller

Overview

In this architecture:

  1. A caller dials an Amazon Connect phone number
  2. The Contact Flow performs routing and initial prompts
  3. The call is transferred to an external bot endpoint
  4. A Bot Media Gateway streams call audio to the Deepgram Voice Agent API
  5. The Deepgram Voice Agent processes the conversation and calls backend APIs when needed
  6. Audio responses are streamed back to the caller

The Voice Agent API operates over a bidirectional WebSocket connection, allowing clients to continuously stream audio and receive responses in real time.


Reference Architecture

Caller (PSTN)
|
v
Amazon Connect
(Inbound Contact Flow)
|
|-- Transfer to phone number / Quick Connect -->
v
External Bot Endpoint
|
v
Bot Media Gateway
- telephony termination
- media streaming bridge
- WebSocket session with Deepgram
|
v
Deepgram Voice Agent API
- real-time voice orchestration
- function calling to backend systems
- generates spoken responses
|
+---------------------------> Business Tools / APIs
| - CRM
| - ticketing
| - order status
| - knowledge base
| - scheduling
|
v
Audio response back to Bot Media Gateway
|
v
Caller
|
v
(optional) transfer back to Amazon Connect queue/agent

Before You Begin

You will need:

  • An Amazon Connect instance
  • A Deepgram API key
  • A server capable of handling SIP, RTP, or WebRTC telephony
  • A Bot Media Gateway service (Node.js, Python, or Go recommended)

Your gateway will:

  • Terminate the phone call
  • Open a WebSocket connection to Deepgram
  • Stream audio between the call and the Voice Agent

Step 1 – Configure Amazon Connect

Create a Contact Flow that routes callers to your AI agent.

Typical flow:

Inbound call
↓
Greeting / IVR
↓
Transfer to external number
↓
Bot endpoint

Use either:

  • Transfer to phone number
  • Quick Connect

This sends the caller to the telephony endpoint hosting your voice agent gateway.

Step 2 – Build the Bot Media Gateway

The Bot Media Gateway bridges telephony audio and the Deepgram Voice Agent WebSocket.

Typical responsibilities:

  • Accept incoming SIP or RTP streams
  • Convert audio into the required format
  • Forward audio frames to Deepgram
  • Play synthesized audio responses back to the caller

Deployment options include:

  • AWS ECS
  • AWS Fargate
  • Kubernetes
  • Containerized microservice

Example architecture of the media gateway:

RTP Audio (phone call)
⇅
Bot Media Gateway
⇅
Deepgram Voice Agent WebSocket

Step 3 – Connect to the Voice Agent API

The Bot Media Gateway opens a WebSocket connection to the Deepgram Voice Agent endpoint.

Example endpoint:

wss://agent.deepgram.com/v1/agent/converse

Once the connection opens:

  1. Wait for the Welcome message
  2. Send a Settings message
  3. Begin streaming audio

The Welcome message confirms the WebSocket connection is established.

Step 4 – Send Voice Agent Settings

Before sending audio, configure the voice agent using a Settings message.

The Settings message initializes the agent and defines audio formats and behavior.

Example:

1{
2 "type": "Settings",
3 "audio": {
4 "input": {
5 "encoding": "linear16",
6 "sample_rate": 24000
7 },
8 "output": {
9 "encoding": "linear16",
10 "sample_rate": 24000,
11 "container": "none"
12 }
13 },
14 "agent": {
15 "instructions": "You are a helpful customer support assistant."
16 }
17}

After sending settings, the server responds with:

SettingsApplied

This confirms configuration has been successfully loaded.

Step 5 – Stream Call Audio

Once the agent is initialized, the gateway begins streaming audio frames.

The Voice Agent API expects raw binary audio frames sent over the WebSocket connection.

Example message type:

AgentV1Media (binary audio)

Deepgram processes the audio and emits conversation events as the interaction progresses.

Step 6 – Handle Voice Agent Events

During the conversation, the server sends real-time events describing the interaction.

Examples include: • UserStartedSpeaking • AgentThinking • AgentStartedSpeaking • ConversationText

These events help the client manage audio playback and conversational state.

Example event:

1{
2 "type": "ConversationText",
3 "role": "assistant",
4 "content": "Sure — I can help with that."
5}

Step 7 – Function Calling

The Deepgram Voice Agent can call backend systems using function calling.

When the agent decides it needs external data, it sends a FunctionCallRequest.

Example:

1{
2 "type": "FunctionCallRequest",
3 "functions": [
4 {
5 "name": "get_order_status",
6 "arguments": {
7 "order_id": "12345"
8 },
9 "client_side": false
10 }
11 ]
12}

Review our function calling docs for more details.

Step 8 - Audio Playback

When the agent generates speech, the Voice Agent API streams synthesized audio back to the client.

Your gateway:

  1. Receives audio frames
  2. Buffers them
  3. Sends them to the caller

Because the WebSocket connection streams audio continuously, playback can begin immediately, reducing latency.

Step 9 - Escalate to a Human Agent

If the voice agent cannot resolve a request, it can transfer the caller back to Amazon Connect.

Typical escalation flow:

Voice Agent detects escalation
↓
Bot Media Gateway initiates transfer
↓
Amazon Connect queue
↓
Human agent

Context from the AI conversation can be stored in a CRM or ticketing system before the transfer.

Additional Resources