Build a Voice Agent

Learn how to build a real-time voice agent using Deepgram’s Agent API.

Deepgram’s Voice Agent API uses a single WebSocket connection to handle the entire conversational loop. The API integrates speech-to-text, a large language model (LLM), and text-to-speech into one stream.

How it works

Building a voice agent involves four main steps over a WebSocket:

Open a connection: Connect to the Deepgram Agent endpoint using a supported SDK or a WebSocket client.
Configure the agent: Send a Settings message to define the models, voices, and behavior.
Stream audio: Send raw audio data to the agent.
Handle events: Listen for transcripts, agent responses, and audio output.

The Voice Agent API is available on the EU endpoint at wss://api.eu.deepgram.com/v1/agent/converse. See Regional Endpoints for details.

Choose your language

Select a language to start building your voice agent. Each tutorial provides a complete, end-to-end implementation.

Python

Build a voice agent using the Deepgram Python SDK.

JavaScript

Build a voice agent using the Deepgram JavaScript SDK.

Build a voice agent using the Deepgram .NET SDK.

Build a voice agent using the Deepgram Go SDK.

Next steps

Once you understand the basics, you can explore more advanced configurations:

Browser Agent Overview: Add voice AI to your web applications.
Configure the Voice Agent: Learn about all available settings for models, voices, and audio formats.
API Reference: View the full WebSocket protocol specification.

Implementation examples

Check out these repositories for more complex voice agent implementations:

Use case	Runtime / Language	Repo
Basic demo	Node, TypeScript, JavaScript	Deepgram Voice Agent Demo
Medical assistant	Node, TypeScript, JavaScript	Medical Assistant Demo
Twilio integration	Python	Twilio Voice Agent Demo
Text input demo	Node, TypeScript, JavaScript	Conversational AI Demo
Azure OpenAI	Python	Voice Agent with OpenAI Azure
Function calling	Python / Flask	Flask Agent Function Calling Demo

Rate limits

For information on concurrency limits, refer to the API Rate Limits documentation.

Usage tracking

Deepgram calculates usage based on WebSocket connection time. One hour of connection time equals one hour of API usage.

Deepgram API Playground

Try this feature out in our API Playground.

How it works

Building a voice agent involves four main steps over a WebSocket:

Open a connection: Connect to the Deepgram Agent endpoint using a supported SDK or a WebSocket client.

Configure the agent: Send a Settings message to define the models, voices, and behavior.

Stream audio: Send raw audio data to the agent.

Handle events: Listen for transcripts, agent responses, and audio output.

The Voice Agent API is available on the EU endpoint at wss://api.eu.deepgram.com/v1/agent/converse. See Regional Endpoints for details.

Next steps

Once you understand the basics, you can explore more advanced configurations:

Browser Agent Overview: Add voice AI to your web applications.

Configure the Voice Agent: Learn about all available settings for models, voices, and audio formats.

API Reference: View the full WebSocket protocol specification.

Implementation examples

Check out these repositories for more complex voice agent implementations:

Use case	Runtime / Language	Repo
Basic demo	Node, TypeScript, JavaScript	Deepgram Voice Agent Demo
Medical assistant	Node, TypeScript, JavaScript	Medical Assistant Demo
Twilio integration	Python	Twilio Voice Agent Demo
Text input demo	Node, TypeScript, JavaScript	Conversational AI Demo
Azure OpenAI	Python	Voice Agent with OpenAI Azure
Function calling	Python / Flask	Flask Agent Function Calling Demo