Build a Voice Agent

Learn how to build a real-time voice agent using Deepgram’s Agent API.

Deepgram’s Voice Agent API uses a single WebSocket connection to handle the entire conversational loop. The API integrates speech-to-text, a large language model (LLM), and text-to-speech into one stream.

How it works

Building a voice agent involves four main steps over a WebSocket:

  1. Open a connection: Connect to the Deepgram Agent endpoint using a supported SDK or a WebSocket client.
  2. Configure the agent: Send a Settings message to define the models, voices, and behavior.
  3. Stream audio: Send raw audio data to the agent.
  4. Handle events: Listen for transcripts, agent responses, and audio output.

The Voice Agent API is available on the EU endpoint at wss://api.eu.deepgram.com/v1/agent/converse. See Configuring Custom Endpoints for details.

Choose your language

Select a language to start building your voice agent. Each tutorial provides a complete, end-to-end implementation.

Next steps

Once you understand the basics, you can explore more advanced configurations:

Implementation examples

Check out these repositories for more complex voice agent implementations:

Use caseRuntime / LanguageRepo
Basic demoNode, TypeScript, JavaScriptDeepgram Voice Agent Demo
Medical assistantNode, TypeScript, JavaScriptMedical Assistant Demo
Twilio integrationPythonTwilio Voice Agent Demo
Text input demoNode, TypeScript, JavaScriptConversational AI Demo
Azure OpenAIPythonVoice Agent with OpenAI Azure
Function callingPython / FlaskFlask Agent Function Calling Demo

Rate limits

For information on concurrency limits, refer to the API Rate Limits documentation.

Usage tracking

Deepgram calculates usage based on WebSocket connection time. One hour of connection time equals one hour of API usage.