Deployment Environments | Deepgram's Docs

Learn more about provisioning a deployment environment with your container management system of choice. The typical architecture for a basic self-hosted deployment includes:

Compute servers with attached NVIDIA GPUs, hosted in either a:
- Virtual Private Cloud (VPC)
- Dedicated Cloud (DC)
- Bare-metal server
A customer-provided proxy (for example, NGINX, HAProxy, Apache) to handle TLS termination
Models and configuration files delivered by Deepgram as appropriate for your use case

Deepgram only supports NVIDIA GPUs at this time. We recommend modern NVIDIA GPUs that are compatible with recent NVIDIA drivers and the NVIDIA Container Toolkit.

Servers

Architecture

Deepgram services are compatible with Linux on x86-64/amd64 only. If you want to run Deepgram services on a different processor/architecture, please contact Support.

Hardware Specifications

Hardware requirements vary significantly depending on which Deepgram services you plan to deploy. Before proceeding with infrastructure planning, you should decide whether you need:

Speech-to-Text (STT) services - For transcription and real-time speech recognition
Text-to-Speech (TTS) services - For conversational AI voice synthesis

While Deepgram uses the same container images for both STT and TTS deployments, we strongly recommend dedicating each node to a single service type. Configure nodes specifically for either STT or TTS workloads to ensure optimal performance, avoid resource contention, and maintain predictable latency.

A system that meets these baseline requirements will provide the maximum speedup possible with Deepgram models, provided you scale according to the number of requests you are servicing.

Please contact Support so we can work with you to create a customized hardware recommendation.

Engine

Hardware requirements differ substantially between STT and TTS deployments:

Speech-to-Text (STT) Engine

For each machine running an STT Engine container, we recommend a machine with the following minimum specifications:

1 NVIDIA GPU
- Minimum compute capability: 7.0+
- 16 GB GPU RAM
- Recommended on the Cloud: NVIDIA L4 GPUs (powerful balance between price and performance)
  - AWS - g6.2xlarge instances
  - GCP - g2-standard-8 instances
- Other commonly used cloud GPUs: NVIDIA T4, NVIDIA A10
  - Azure - Standard_NV36ads_A10_v5 instances (NVIDIA A10)
4 CPU cores
32 GB system RAM
- You may be able to run Deepgram services with only 16 GB of system RAM if you are only using a small number of models. Please contact Support to see if this is possible for your use case.
50 GB storage
- We recommend more if you intend on deploying multiple STT models.

Text-to-Speech (TTS) Engine

For each machine running a TTS Engine container, we recommend a machine with the following minimum specifications:

2 NVIDIA GPUs (dual GPU requirement for TTS services)
- Minimum compute capability: 7.0+
- 16 GB GPU RAM per GPU (32 GB total)
- Recommended on the Cloud: NVIDIA L4 GPUs (powerful balance between price and performance)
  - AWS - g6.12xlarge instances (4 L4 GPUs, allows running multiple Engine instances for additional capacity)
  - GCP - g2-standard-24 or g2-standard-48 instances
- Other commonly used cloud GPUs: NVIDIA A10, NVIDIA A100
  - Azure - Standard_NV72ads_A10_v5 instances (NVIDIA A10)
8 CPU cores (increased for TTS workloads and running multiple Engine instances)
64 GB system RAM (increased for dual GPU configuration)
50 GB storage
- We recommend more if you intend on deploying multiple TTS models.

API

You may run a Deepgram API container on the same machine as a Deepgram Engine container, with specifications as defined in the section above.

If you have a more complex setup with multiple API and Engine containers, you may want to run API containers on separate machines to properly load-balance across your Engine containers. If this is the case, for each machine running an API container, we recommend the following minimum specifications:

4 CPU cores
16 GB system RAM

Other Deepgram Services

If you are using the optional License Proxy container in your environment, see the License Proxy guide for hardware requirements.

Cloud-Hosted

Cloud-hosted deployments such as a Virtual Private Cloud are the most common type of deployment performed by customers who want to leverage Deepgram within their own infrastructure. Specific configurations vary, but the Cloud Service Providers that are typically used include Amazon Web Services (AWS), Oracle Cloud Infrastructure (OCI), Microsoft Azure, and Google Cloud Platform (GCP).

If you are using a cloud provider not listed above, there may not yet be a custom guide for provisioning infrastructure for self-hosting Deepgram software. The details in this documentation will still be helpful as you apply them to the specific cloud service providers you are using. Please contact Support if you need additional assistance with your cloud provider of choice.

Bare Metal

Bare-metal servers are physical servers committed to a single user, providing direct access to all of its resources and eliminating any performance interference from shared users. For those with known compute requirements and in-house expertise on managing hardware, bare-metal servers within on-premises data centers can provide significant cost savings compared to comparable cloud offerings.

Supporting Infrastructure

Operating System

Deepgram’s self-hosted products run on the Linux operating system. The following distributions are officially supported:

Ubuntu Server 22.04/24.04
Red Hat Enterprise Linux (RHEL) 8/9
Oracle Linux 8/9

We recommend these distributions because we have tested our products most extensively in these systems.

Container Orchestration

Deepgram’s software is intended to be run in containers . Containers are used to isolate, secure, and deploy applications across many different host environments and operating systems. Containers are more lightweight than virtual machines but provide many of the same isolation benefits. The following container orchestration tools are officially supported:

Please see the respective documentation for details on hardware and OS support for each tool. For example, Docker Engine is not supported by RHEL or Oracle Linux.

Firewall

Ingress Traffic

No ingress network traffic will ever be initiated by Deepgram’s servers into an self-hosted environment. All ingress traffic to Deepgram services can be deny-listed in your self-hosted environment.

To allow your own applications to make requests to your self-hosted Deepgram deployment, please refer to the Ingress Authentication guide.

Egress Traffic

Quay

Access to Deepgram container images is provided through Quay. If your deployment will pull images directly from their servers instead of through your own private container registry, you’ll need to allow egress traffic to their servers on port 443.

Deepgram License Server

An active connection with the Deepgram License Server is required at all times for licensing and usage reporting. HTTPS network traffic to license.deepgram.com on port 443 should be allow-listed in your egress rules. Deepgram will never initiate a network connection from our servers to your deployment, so ingress rules can be configured to deny-list any traffic outside of your specific application’s needs.

The Deepgram License Server uses mTLS to secure the connection between license.deepgram.com and your self-hosted applications. Trying to connect directly with cURL or with an HTTP scanning tool (like Qualys SSL Labs) will produce spurious errors. This is expected, correct behavior and does not indicate a problem with the service itself. If your containers are not starting up, check the container logs for the following errors:

401 Unauthorized HTTP response
- Verify your Deepgram API key is properly configured in your environment.
- Verify in the Deepgram Console that the API key you are using has permissions for self-hosted products
Request timeouts
- Verify your firewall allows egress traffic to the Deepgram License Server

Intra-Network Traffic

Intra-network traffic (sometimes called east-west traffic in a cloud environment) should allow communication between Deepgram containers. If running all containers within a single container orchestrator, the orchestrator will typically assume responsibilty for inter-container networking. Ensure your firewall is configured with the proper permissions for your container orchestrator.

The Deepgram API and Engine containers listen on port 8080 by default, and the Deepgram License Proxy lisltens on port 8443 by default, unless otherwise configured.

Alternate Deployment Options

In addition to deploying Deepgram services as containers, using your own orchestrator, you can also deploy Deepgram using these options:

Amazon SageMaker AI Endpoint - simplified machine learning Container-as-a-Service deployment model

Creating a Deployment Environment

The next step is to learn more about provisioning a deployment environment. Deepgram has official guides for using Docker, Podman, and Kubernetes as container orchestrators. Follow each the links below to get an overview of the pros and cons of each orchestrator and decide which is appropriate for your use case.

What’s Next

Continue with the common setup path that applies to both STT and TTS deployments. Learn more about provisioning a deployment environment with your container orchestrator of choice:

After completing your environment setup and credentials configuration, you’ll choose between STT or TTS deployment paths in the Self Service Licensing & Credentials guide.