Build a Voice Agent
Learn how to build a real-time voice agent using Deepgram’s Agent API.
Deepgram’s Voice Agent API uses a single WebSocket connection to handle the entire conversational loop. The API integrates speech-to-text, a large language model (LLM), and text-to-speech into one stream.
How it works
Building a voice agent involves four main steps over a WebSocket:
- Open a connection: Connect to the Deepgram Agent endpoint using a supported SDK or a WebSocket client.
- Configure the agent: Send a
Settingsmessage to define the models, voices, and behavior. - Stream audio: Send raw audio data to the agent.
- Handle events: Listen for transcripts, agent responses, and audio output.
The Voice Agent API is available on the EU endpoint at wss://api.eu.deepgram.com/v1/agent/converse. See Configuring Custom Endpoints for details.
Choose your language
Select a language to start building your voice agent. Each tutorial provides a complete, end-to-end implementation.
Build a voice agent using the Deepgram Python SDK.
Build a voice agent using the Deepgram JavaScript SDK.
Build a voice agent using the Deepgram .NET SDK.
Build a voice agent using the Deepgram Go SDK.
Next steps
Once you understand the basics, you can explore more advanced configurations:
- Browser Agent Overview: Add voice AI to your web applications.
- Configure the Voice Agent: Learn about all available settings for models, voices, and audio formats.
- API Reference: View the full WebSocket protocol specification.
Implementation examples
Check out these repositories for more complex voice agent implementations:
Rate limits
For information on concurrency limits, refer to the API Rate Limits documentation.
Usage tracking
Deepgram calculates usage based on WebSocket connection time. One hour of connection time equals one hour of API usage.