Deployment Environments

Deepgram's self-hosted offering can be used with Virtual Private Cloud providers, such as AWS, GCP, Oracle, or Azure, as well as with on-premises/bare-metal deployments.

Learn more about provisioning a deployment environment with your container management system of choice. The typical architecture for a basic self-hosted deployment includes:

  • Compute servers with attached NVIDIA GPUs, hosted in either a:
    • Virtual Private Cloud (VPC)
    • Dedicated Cloud (DC)
    • Bare-metal server
  • A customer-provided proxy (for example, NGINX, HAProxy, Apache) to handle TLS termination
  • Models and configuration files delivered by Deepgram as appropriate for your use case

📘

Deepgram only supports NVIDIA GPUs at this time. We recommend modern NVIDIA GPUs that are compatible with recent NVIDIA drivers and the NVIDIA Container Toolkit.

Servers

Architecture

Deepgram services are compatible with Linux on x86-64/amd64 only. If you want to run Deepgram services on a different processor/architecture, please contact Support.

Hardware Specifications

A system that meets these baseline requirements will provide the maximum speedup possible with Deepgram models, provided you scale according to the number of requests you are servicing.

Please contact Support so we can work with you to create a customized hardware recommendation.

Engine

For each machine running an Engine container, we recommend a machine with the following minimum specifications:

  • A NVIDIA GPU
    • Minimum compute capability: 7.0+
    • 16 GB GPU RAM
    • Recommended on the Cloud: NVIDIA L4 GPUs (powerful balance between price and performance)
      • AWS - g6 series
      • GCP - g2 series
      • Other commonly used cloud GPUs: NVIDIA T4, NVIDIA A10
  • 4 CPU cores
  • 32 GB system RAM
    • You may be able to run Deepgram services with only 16 GB of system RAM if you are only using a small number of models. Please contact Support to see if this is possible for your use case.
  • 48 GB storage
    • We recommend more if you intend on deploying multiple models.

API

You may run a Deepgram API container on the same machine as a Deepgram Engine container, with specifications as defined in the section above.

If you have a more complex setup with multiple API and Engine containers, you may want to run API containers on separate machines to properly load-balance across your Engine containers. If this is the case, for each machine running an API container, we recommend the following minimum specifications:

  • 2 CPU cores
  • 4 GB system RAM

Other Deepgram Services

If you are using the optional License Proxy container in your environment, see the License Proxy guide for hardware requirements.

Cloud-Hosted

Cloud-hosted deployments such as a Virtual Private Cloud are the most common type of deployment performed by customers who want to leverage Deepgram within their own infrastructure. Specific configurations vary, but the Cloud Service Providers that are typically used include Amazon Web Services (AWS), Oracle Cloud Infrastructure (OCI), Microsoft Azure, and Google Cloud Platform (GCP).

If you are using a cloud provider not listed above, there may not yet be a custom guide for provisioning infrastructure for self-hosting Deepgram software. The details in this documentation will still be helpful as you apply them to the specific cloud service providers you are using. Please contact Support if you need additional assistance with your cloud provider of choice.

Bare Metal

Bare-metal servers are physical servers committed to a single user, providing direct access to all of its resources and eliminating any performance interference from shared users. For those with known compute requirements and in-house expertise on managing hardware, bare-metal servers within on-premises data centers can provide significant cost savings compared to comparable cloud offerings.

Supporting Infrastructure

Operating System

Deepgram's self-hosted products run on the Linux operating system. The following distributions are officially supported:

  • Ubuntu Server 22.04/24.04
  • Red Hat Enterprise Linux (RHEL) 8/9
  • Oracle Linux 8/9

We recommend these distributions because we have tested our products most extensively in these systems.

Container Orchestration

Deepgram's software is intended to be run in containers . Containers are used to isolate, secure, and deploy applications across many different host environments and operating systems. Containers are more lightweight than virtual machines but provide many of the same isolation benefits. The following container orchestration tools are officially supported:

Please see the respective documentation for details on hardware and OS support for each tool. For example, Docker Engine is not supported by RHEL or Oracle Linux.

Firewall

Ingress Traffic

No ingress network traffic will ever be initiated by Deepgram's servers into an self-hosted environment. All ingress traffic to Deepgram services can be deny-listed in your self-hosted environment.

To allow your own applications to make requests to your self-hosted Deepgram deployment, please refer to the Ingress Authentication guide.

Egress Traffic

Quay

Access to Deepgram container images is provided through Quay. If your deployment will pull images directly from their servers instead of through your own private container registry, you'll need to allow egress traffic to their servers on port 443.

Deepgram License Server

An active connection with the Deepgram License Server is required at all times for licensing and usage reporting. HTTPS network traffic to license.deepgram.com on port 443 should be allow-listed in your egress rules. Deepgram will never initiate a network connection from our servers to your deployment, so ingress rules can be configured to deny-list any traffic outside of your specific application's needs.

The Deepgram License Server uses mTLS to secure the connection between license.deepgram.com and your self-hosted applications. Trying to connect directly with cURL or with an HTTP scanning tool (like Qualys SSL Labs) will produce spurious errors. This is expected, correct behavior and does not indicate a problem with the service itself. If your containers are not starting up, check the container logs for the following errors:

  • 401 Unauthorized HTTP response
    • Verify your Deepgram API key is properly configured in your environment.
    • Verify in the Deepgram Console that the API key you are using has permissions for self-hosted products
  • Request timeouts
    • Verify your firewall allows egress traffic to the Deepgram License Server

Intra-Network Traffic

Intra-network traffic (sometimes called east-west traffic in a cloud environment) should allow communication between Deepgram containers. If running all containers within a single container orchestrator, the orchestrator will typically assume responsibilty for inter-container networking. Ensure your firewall is configured with the proper permissions for your container orchestrator.

The Deepgram API and Engine containers listen on port 8080 by default, and the Deepgram License Proxy lisltens on port 8443 by default, unless otherwise configured.

Creating a Deployment Environment

The next step is to learn more about provisioning a deployment environment. Deepgram has official guides for using Docker, Podman, and Kubernetes as container orchestrators. Follow each the links below to get an overview of the pros and cons of each orchestrator and decide which is appropriate for your use case.


What’s Next

Learn more about provisioning a deployment environment with your container orchestrator of choice.