Amazon Connect and Deepgram Voice Agent

In this guide, you’ll integrate Amazon Connect inbound calls to a Deepgram-powered voice agent using a Bot Media Gateway that streams call audio over WebSockets.

The Deepgram Voice Agent will:

Process speech in real time
Manage the conversation
Invoke backend APIs using function calling
Generate spoken responses streamed back to the caller

Overview

In this architecture:

A caller dials an Amazon Connect phone number
The Contact Flow performs routing and initial prompts
The call is transferred to an external bot endpoint
A Bot Media Gateway streams call audio to the Deepgram Voice Agent API
The Deepgram Voice Agent processes the conversation and calls backend APIs when needed
Audio responses are streamed back to the caller

The Voice Agent API operates over a bidirectional WebSocket connection, allowing clients to continuously stream audio and receive responses in real time.

Reference Architecture

Caller (PSTN)
   |
   v
Amazon Connect
(Inbound Contact Flow)
   |
   |-- Transfer to phone number / Quick Connect -->
   v
External Bot Endpoint
   |
   v
Bot Media Gateway
- telephony termination
- media streaming bridge
- WebSocket session with Deepgram
   |
   v
Deepgram Voice Agent API
- real-time voice orchestration
- function calling to backend systems
- generates spoken responses
   |
   +---------------------------> Business Tools / APIs
   |                             - CRM
   |                             - ticketing
   |                             - order status
   |                             - knowledge base
   |                             - scheduling
   |
   v
Audio response back to Bot Media Gateway
   |
   v
Caller
   |
   v
(optional) transfer back to Amazon Connect queue/agent

Before You Begin

You will need:

An Amazon Connect instance
A Deepgram API key
A server capable of handling SIP, RTP, or WebRTC telephony
A Bot Media Gateway service (Node.js, Python, or Go recommended)

Your gateway will:

Terminate the phone call
Open a WebSocket connection to Deepgram
Stream audio between the call and the Voice Agent

Step 1 – Configure Amazon Connect

Create a Contact Flow that routes callers to your AI agent.

Typical flow:

Inbound call
   ↓
Greeting / IVR
   ↓
Transfer to external number
   ↓
Bot endpoint

Use either:

Transfer to phone number
Quick Connect

This sends the caller to the telephony endpoint hosting your voice agent gateway.

Step 2 – Build the Bot Media Gateway

The Bot Media Gateway bridges telephony audio and the Deepgram Voice Agent WebSocket.

Typical responsibilities:

Accept incoming SIP or RTP streams
Convert audio into the required format
Forward audio frames to Deepgram
Play synthesized audio responses back to the caller

Deployment options include:

AWS ECS
AWS Fargate
Kubernetes
Containerized microservice

Example architecture of the media gateway:

RTP Audio (phone call)
       ⇅
Bot Media Gateway
       ⇅
Deepgram Voice Agent WebSocket

Step 3 – Connect to the Voice Agent API

The Bot Media Gateway opens a WebSocket connection to the Deepgram Voice Agent endpoint.

Example endpoint:

wss://agent.deepgram.com/v1/agent/converse

For EU data processing, use wss://api.eu.deepgram.com/v1/agent/converse. See Regional Endpoints for details.

Once the connection opens:

Wait for the Welcome message
Send a Settings message
Begin streaming audio

The Welcome message confirms the WebSocket connection is established.

Step 4 – Send Voice Agent Settings

Before sending audio, configure the voice agent using a Settings message.

The Settings message initializes the agent and defines audio formats and behavior.

Example:

1 {
2  "type": "Settings",
3  "audio": {
4    "input": {
5      "encoding": "linear16",
6      "sample_rate": 24000
7    },
8    "output": {
9      "encoding": "linear16",
10      "sample_rate": 24000,
11      "container": "none"
12    }
13  },
14  "agent": {
15    "instructions": "You are a helpful customer support assistant."
16  }
17 }

After sending settings, the server responds with:

SettingsApplied

This confirms configuration has been successfully loaded.

Step 5 – Stream Call Audio

Once the agent is initialized, the gateway begins streaming audio frames.

The Voice Agent API expects raw binary audio frames sent over the WebSocket connection.

Example message type:

AgentV1Media (binary audio)

Deepgram processes the audio and emits conversation events as the interaction progresses.

Step 6 – Handle Voice Agent Events

During the conversation, the server sends real-time events describing the interaction.

Examples include: • UserStartedSpeaking • AgentThinking • AgentStartedSpeaking • ConversationText

These events help the client manage audio playback and conversational state.

Example event:

1 {
2   "type": "ConversationText",
3   "role": "assistant",
4   "content": "Sure — I can help with that."
5 }

Step 7 – Function Calling

The Deepgram Voice Agent can call backend systems using function calling.

When the agent decides it needs external data, it sends a FunctionCallRequest.

Example:

1 {
2  "type": "FunctionCallRequest",
3  "functions": [
4    {
5      "name": "get_order_status",
6      "arguments": {
7        "order_id": "12345"
8      },
9      "client_side": false
10    }
11  ]
12 }

Review our function calling docs for more details.

Step 8 - Audio Playback

When the agent generates speech, the Voice Agent API streams synthesized audio back to the client.

Your gateway:

Receives audio frames
Buffers them
Sends them to the caller

Because the WebSocket connection streams audio continuously, playback can begin immediately, reducing latency.

Step 9 - Escalate to a Human Agent

If the voice agent cannot resolve a request, it can transfer the caller back to Amazon Connect.

Typical escalation flow:

Voice Agent detects escalation
        ↓
Bot Media Gateway initiates transfer
        ↓
Amazon Connect queue
        ↓
Human agent

Context from the AI conversation can be stored in a CRM or ticketing system before the transfer.

Amazon Connect and Deepgram Voice Agent

Amazon Connect and Deepgram Voice Agent

Amazon Connect and Deepgram Voice Agent

Overview

Reference Architecture

Before You Begin

Step 1 – Configure Amazon Connect

Step 2 – Build the Bot Media Gateway

Step 3 – Connect to the Voice Agent API

Step 4 – Send Voice Agent Settings

Step 5 – Stream Call Audio

Step 6 – Handle Voice Agent Events

Step 7 – Function Calling

Step 8 - Audio Playback

Step 9 - Escalate to a Human Agent

Additional Resources

Amazon Connect and Deepgram Voice Agent

Overview

Reference Architecture

Before You Begin

Step 1 – Configure Amazon Connect

Step 2 – Build the Bot Media Gateway

Step 3 – Connect to the Voice Agent API

Step 4 – Send Voice Agent Settings

Step 5 – Stream Call Audio

Step 6 – Handle Voice Agent Events

Step 7 – Function Calling

Step 8 - Audio Playback

Step 9 - Escalate to a Human Agent

Additional Resources

1	{
2	"type": "Settings",
3	"audio": {
4	"input": {
5	"encoding": "linear16",
6	"sample_rate": 24000
7	},
8	"output": {
9	"encoding": "linear16",
10	"sample_rate": 24000,
11	"container": "none"
12	}
13	},
14	"agent": {
15	"instructions": "You are a helpful customer support assistant."
16	}
17	}

1	{
2	"type": "ConversationText",
3	"role": "assistant",
4	"content": "Sure — I can help with that."
5	}

1	{
2	"type": "FunctionCallRequest",
3	"functions": [
4	{
5	"name": "get_order_status",
6	"arguments": {
7	"order_id": "12345"
8	},
9	"client_side": false
10	}
11	]
12	}