Deployment Environments

Deepgram's self-hosted offering can be used with Virtual Private Cloud providers, such as AWS, GCP, Oracle, or Azure, as well as with bare-metal deployments.

The typical architecture for a basic self-hosted deployment includes:

  • Compute servers with attached NVIDIA GPUs, hosted in either a:
    • Virtual Private Cloud (VPC)
    • Dedicated Cloud (DC)
    • Bare-metal server
  • A customer-provided proxy (for example, NGINX, HAProxy, Apache) to handle TLS termination
  • Models and configuration files delivered by Deepgram as appropriate for your use case

📘

Deepgram only supports NVIDIA GPUs at this time. We recommend modern NVIDIA GPUs that are compatible with recent NVIDIA drivers and the NVIDIA Container Toolkit.

Servers

Architecture

Deepgram services are compatible with Linux on x86-64. If you want to run Deepgram services on a different processor/architecture, please contact Support.

Hardware Specifications

A system that meets these baseline requirements will provide the maximum speedup possible with Deepgram models, provided you scale according to the number of requests you are servicing.

Please contact Support so we can work with you to create a customized hardware recommendation.

Engine

For each machine running an Engine container, we recommend a machine with the following minimum specifications:

  • A NVIDIA GPU
    • T4 model or better. If available, NVIDIA L4 GPUs often provide a powerful balance between price and performance.
    • 16 GB GPU RAM
  • 4 CPU cores
  • 32 GB system RAM
    • You may be able to run Deepgram services with only 16 GB of system RAM if you are only using a small number of models. Please contact Support to see if this is possible for your use case.
  • 48 GB storage
    • We recommend more if you intend on deploying multiple models, and start at 100 GiB for a DGTools model training environment.

API

You may run a Deepgram API container on the same machine as a Deepgram Engine container, with specifications as defined in the section above.

If you have a more complex setup with multiple API and Engine containers, you may want to run API containers on separate machines to properly load-balance across your Engine containers. If this is the case, for each machine running an API container, we recommend the following minimum specifications:

  • 2 CPU cores
  • 4 GB system RAM

Other Deepgram Services

If you are using the optional License Proxy container in your environment, see the License Proxy guide for hardware requirements.

Cloud-Hosted

Cloud-hosted deployments such as a Virtual Private Cloud are the most common type of deployment performed by customers who want to leverage Deepgram within their own infrastructure. Specific configurations vary, but the Cloud Service Providers that are typically used include Amazon Web Services (AWS), Oracle Cloud Infrastructure (OCI), Microsoft Azure, and Google Cloud Platform (GCP).

If you are using a cloud provider not listed above, there may not yet be a custom guide for provisioning infrastructure for self-hosting Deepgram software. The details in this documentation will still be helpful as you apply them to the specific cloud service providers you are using. Please contact Support if you need additional assistance with your cloud provider of choice.

Bare Metal

Bare-metal servers are physical servers committed to a single user, providing direct access to all of its resources and eliminating any performance interference from shared users. For those with known compute requirements and in-house expertise on managing hardware, bare-metal servers within on-premises data centers can provide significant cost savings compared to comparable cloud offerings.

Supporting Infrastructure

Operating System

Deepgram's self-hosted products run on the Linux operating system. The following distributions are officially supported:

  • Ubuntu Server 22.04/24.04
  • Red Hat Enterprise Linux (RHEL) 8/9
  • Oracle Linux 8/9

We recommend these distributions because we have tested our products most extensively in these systems.

Container Orchestration

Deepgram's software is intended to be run in containers . Containers are used to isolate, secure, and deploy applications across many different host environments and operating systems. Containers are more lightweight than virtual machines but provide many of the same isolation benefits. The following container orchestration tools are officially supported:

Please see the respective documentation for details on hardware and OS support for each tool. For example, Docker Engine is not supported by RHEL or Oracle Linux.

Firewall

Container runtimes typically assume responsibilty for inter-container networking. Ensure your firewall is configured with the proper permissions for your container runtime.

The Deepgram API and Engine containers listen on port 8080 by default, and the Deepgram License Proxy lisltens on port 8443 by default, unless otherwise configured.

An active connection with the Deepgram License Server is required at all times for licensing and usage reporting. HTTPS network traffic to license.deepgram.com on port 443 should be allow-listed in your egress rules. Deepgram will never initiate a network connection from our servers to your deployment, so ingress rules can be configured to deny-list any traffic outside of your specific application's needs.

Access to Deepgram container images is provided through Quay. If your deployment will pull images directly from their servers instead of through your own private container registry, you'll need to allow egress traffic to their servers on port 443.


What’s Next

Learn more about provisioning a deployment environment with your container management system of choice.