Deploy Voice Agent
Self-Hosted Deployment with Kubernetes and Helm
This guide covers deploying Deepgram’s Voice Agent API in a self-hosted Kubernetes environment using the Deepgram Helm chart. The Voice Agent API orchestrates Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS) into a single WebSocket-based conversational pipeline.
The Voice Agent API is served by the same self-hosted-api container used for STT and TTS. The /v1/agent/converse WebSocket endpoint becomes available when STT and TTS Engine services are running alongside the API.
Prerequisites
1. Get Configuration Files
Voice Agent configuration files are available in the self-hosted-resources repository.
Two files are provided for AWS deployments:
05-voice-agent-aws.cluster-config.yaml— EKS cluster configuration with dedicated node groups for API, Engine (GPU), and License Proxy workloads.05-voice-agent-aws.values.yaml— Helm values file with Voice Agent enabled, including STT, TTS, and end-of-turn Engine replicas.
2. Get Deepgram Models
Obtain your Voice Agent model links from your Deepgram Account Representative. The recommended models for Voice Agent deployments include:
STT
- If using Flux:
- Flux model (e.g.
flux-general-en.*.dg)
- Flux model (e.g.
- If using Nova-3:
- Nova-3 model (e.g.
nova-3-general.en.streaming.*.dg) - End-of-turn model (e.g.
end-of-turn.*.dg)
- Nova-3 model (e.g.
TTS
- If using Aura-2:
- Voice model (e.g.,
aura-2.voice-pack.en.*.dg) - Generator model (e.g.,
aura-2.generator.en.*.dg)
- Voice model (e.g.,
- If using Aura-1:
- Voice model (e.g.
aura-asteria-en.*.dg) - Phonemizer (e.g.
phonemizer.en.*.dg)
- Voice model (e.g.
3. Deploy Kubernetes Cluster
You need a running Kubernetes cluster with GPU-enabled nodes. Choose a platform option based on your infrastructure and follow the corresponding setup guide:
The Voice Agent sample configuration (05-voice-agent-aws) provisions dedicated node groups for API, Engine, and License Proxy workloads. If you are starting from an existing cluster, ensure your node groups have the appropriate labels and GPU resources.
Verify the Deployment
Check that all pods are running:
Testing Your Deployment
To make sure your Deepgram self-hosted Voice Agent deployment is properly configured and running, you will want to verify the services and make sample requests.
Port-Forward the API Service
Forward the API service to test locally:
Networking Considerations
Unless you have HTTPS or TLS running on your API instance, construct your Deepgram API endpoint with http://, not https://, and ws://, not wss:// (for instance, ws://localhost:8080/v1/agent/converse).
Test STT
Test the Speech-to-Text service with a sample audio file.
- Download a sample file from Deepgram (or supply your own file).
Shell
- Send your audio file to your local Deepgram setup for transcription.
Shell
If you’re using your own file, make sure to replace bueller.wav with the name of your audio file.
You should receive a JSON response with the transcription and associated metadata.
Test TTS
Test the Text-to-Speech service with a sample speak request.
You should receive a response with the audio output. You can play the file locally to evaluate the synthesized speech.
Initializing the Voice Agent
The Voice Agent Getting Started guide walks through building a voice agent with Deepgram’s hosted API. For self-hosted deployments, the only difference is how you initialize the DeepgramClient — you point it at your self-hosted endpoint instead of the default api.deepgram.com.
Hosted (default):
Self-hosted — set the baseUrl to your port-forwarded or load-balanced endpoint:
Once connected, follow the same steps as the Voice Agent Getting Started guide to configure and interact with your agent. For additional SDK configuration details, see Using SDKs with Self-Hosted.
What’s Next
Now that you have a Voice Agent deployment working, explore advanced configuration and integration options: