Deploy Text-to-Speech (TTS) Services
Deploy Text-to-Speech (TTS) Services
Self-Hosted Deployment with Docker/Podman
Deploy Text-to-Speech (TTS) Services
Self-Hosted Deployment with Docker/Podman
This guide covers deploying Deepgram’s Text-to-Speech (TTS) services for conversational AI voice synthesis with ultra-low latency and high-quality natural speech generation.
Looking to deploy Speech-to-Text (STT) services instead? See the Deploy STT Services guide.
As with other Deepgram self-hosted services, Deepgram TTS services use the same container images (quay.io/deepgram/self-hosted-api and quay.io/deepgram/self-hosted-engine) as STT deployments. However, Deepgram strongly recommends configuring each node for a specific service type—either STT or TTS—for optimal performance and resource utilization.
You need to download and deploy these images from a container image repository, along with TTS-specific configuration files and environment variables that will be provided by Deepgram.
Latency is the time delay between when a TTS API request is submitted and when you receive the first byte of audio data that can be played back to the end user.
Throughput refers to the number of TTS requests that can be processed successfully within a given time period.
There is a technical tradeoff between these metrics:
Finding the correct balance is essential for your application’s success and to limit hardware costs. Your Deepgram Account Representative will assist with optimization.
For optimal TTS performance, dedicate separate GPUs specifically for TTS traffic. If you’re already running STT services, route TTS requests to dedicated TTS-optimized hardware rather than sharing GPUs between STT and TTS workloads.
This dedicated approach is critical for:
Before you begin, you will need to complete the Deployment Environments guide, as well as all sub-guides to complete your environment configuration.
You will also need to complete the Self Service Licensing & Credentials guide to authenticate your products with Deepgram’s licensing servers and pull Deepgram container images from Quay.
TTS self-hosted deployments have specific resource requirements:
TTS hardware requirements are higher than standard STT deployments due to the computational complexity of high-quality speech synthesis. Consult your Deepgram Account Representative for hardware recommendations optimized for TTS latency vs. throughput based on your specific use case.
Aura-2 supports both English and Spanish languages with superior quality and performance. You can choose to deploy:
For multiple service deployments (same language or different languages), you’ll need to:
Previous generation Aura models may have different configuration requirements. Contact your Deepgram Account Representative for guidance on legacy model deployment.
Use the image repository credentials you generated in the self-service licensing and credentials guide to login to Quay on your deployment environment. Once your credentials are cached locally, you should not have to log in again (until after you manually log out).
TTS services can be deployed using multiple container orchestration platforms:
Suitable for development, testing, and simple production deployments.
Alternative to Docker with similar functionality and compose file compatibility.
Recommended for production deployments requiring scaling, high availability, and advanced orchestration.
Configuration files are available in the self-hosted-resources repository.
Create your configuration directory:
Download the appropriate files for your deployment method:
Docker Compose:
docker/docker-compose.aura-2.yml - Reference configuration with both English and SpanishPodman Compose:
podman/podman-compose.aura-2.yml - Reference configuration with both English and SpanishKubernetes/Helm:
charts/deepgram-self-hosted/samples/04-aura-2-setup.yaml - Helm values exampleLanguage-specific Configuration Files:
common/standard_deploy/api.aura-2-en.tomlcommon/standard_deploy/engine.aura-2-en.tomlcommon/standard_deploy/api.aura-2-es.tomlcommon/standard_deploy/engine.aura-2-es.tomlThe reference compose files show both languages deployed side-by-side. You can modify these to deploy only the services you need.
Once you have downloaded all provided files to your deployment machine, you need to update your configuration for your specific deployment environment.
You will need to have an environment variable DEEPGRAM_API_KEY exported with your self-hosted API key secret. See our Self Service Licensing & Credentials guide for instructions on generating a self-hosted API key for use in this section.
Per the link above, you will create you self-hosted API key in the API Key tab of Deepgram Console. These are not created in the “Self-Hosted” tab, which is reserved for creating distribution credentials.
The Docker Compose configuration files use the standard Deepgram self-hosted container images:
quay.io/deepgram/self-hosted-api:release-250814quay.io/deepgram/self-hosted-engine:release-250814While these are the same container images used for STT deployments, Deepgram strongly recommends dedicating each node to either STT or TTS workloads. Running mixed workloads on the same node can lead to suboptimal performance, resource contention, and unpredictable latency.
TTS functionality is enabled through specific configuration files, environment variables, and GPU assignments detailed below.
Make sure to export your self-hosted API key secret in your deployment environment.
TTS deployments use service-specific configuration files and environment variables:
Configuration Files by Language (Aura-2):
api.aura-2-en.toml - English API configuration pointing to English Engineengine.aura-2-en.toml - English Engine configurationapi.aura-2-es.toml - Spanish API configuration pointing to Spanish Engineengine.aura-2-es.toml - Spanish Engine configurationRequired Environment Variables by Language (Aura-2):
English Services:
Spanish Services:
Polyglot Services (Dutch, German, French, Italian, Japanese):
Important Notes:
CUDA_VISIBLE_DEVICES should be set to pin each service to specific GPUsFor legacy Aura models or other TTS configurations, consult your Deepgram Account Representative for the appropriate configuration files and environment variables.
To make sure your Deepgram self-hosted TTS deployment is properly configured and running, you will want to run the containers and make a sample request.
Now that you have your configuration files setup up and in the correct location to be used by the container, use Docker Compose to run the container:
If you get an error similar to the following, you may not have the minimum NVIDIA driver version required for Deepgram services to run properly. Please see Drivers and Containerization Platforms for instructions on installing/upgrading to the latest driver version.
You can then view the running containers with the container process status command, and optionally view the logs of each container to verify their status.
Replace the placeholder CONTAINER_ID with the Container ID of each container whose logs you would like to inspect more completely.
Port Assignments: The reference configurations use these port mappings:
For custom deployments, ensure:
Protocol Usage:
Unless you have HTTPS/TLS configured, use http:// and ws:// protocols:
http://localhost:8080/v1/speakhttp://localhost:8081/v1/speakhttp://localhost:8080, http://localhost:8081, etc.Test your environment and container setup with sample TTS requests.
Test your TTS deployment with speak requests:
Test English Service (Aura-2):
Test Spanish Service (Aura-2, if deployed):
Test Multiple English Instances (if deployed):
If you do not specify a model, the default voice model aura-asteria-en will be used. You can find all of our available voices here.
You should receive a response with the audio output. You can copy this file locally to manually evaluate the synthesized speech. Congratulations - your self-hosted TTS setup is working!
What’s Next
Now that you have a basic TTS setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.