Deployment Environments

Deepgram onprem deployment supports bare-metal servers or a Virtual Private Cloud provider such as AWS, with either NVIDIA GPU-acceleration or traditional CPU compute.

The typical architecture for a basic onprem deployment for the Deepgram Engine and the Deepgram API includes:

  • Server(s) hosted on either a:
    • Virtual Private Cloud (VPC)
    • Dedicated Cloud (DC)
    • Bare-metal server (bare metal)
  • A customer-provided proxy (for example, NGINX, HAProxy, Apache) to handle TLS termination
  • NVIDIA GPU(s)
  • Deepgram-delivered models and configuration files appropriate for your domain

📘

Deepgram only supports NVIDIA GPUs. We recommend modern NVIDIA GPUs that are compatible with recent NVIDIA drivers and the NVIDIA Container Toolkit.

Servers

Architecture

Deepgram services are compatible with Linux on x86-64. If you want to run Deepgram services on a different processer/architecture, please contact Support.

Hardware Specifications

A system that meets these baseline requirements will provide the maximum speedup possible with Deepgram models, provided you scale according to the number of requests you are servicing.

Please contact Support so we can work with you to create a customized hardware recommendation.

Engine

For each machine running an Engine container, we recommend a machine with the following minimum specifications:

  • A NVIDIA GPU
    • T4 model or better. If available, NVIDIA L4 GPUs often provide a powerful balance between price and performance.
    • 16 GB GPU RAM
  • 4 CPU cores
  • 32 GB system RAM
  • 48 GB storage
    • We recommend more if you intend on deploying multiple models, and start at 100 GiB for a DGTools model training environment.

📘

You may be able to run Deepgram services with only 16 GB of system RAM if you are only using a small number of models. Please contact Support to see if this is possible for your use case.

API

You may run a Deepgram API container on the same machine as a Deepgram Engine container, with specifications as defined in the section above.

If you have a more complex setup with multiple API and Engine containers, you may want to run API containers on separate machines to properly load-balance across your Engine containers. If this is the case, for each machine running an API container, we recommend the following minimum specifications:

  • 2 CPU cores
  • 4 GB system RAM

Other Deepgram Services

If you are using the optional License Proxy container in your environment, see the License Proxy guide for hardware requirements.

Cloud-Hosted

Cloud-hosted deployments such as a Virtual Private Cloud are the most common type of deployment performed by customers who want to leverage Deepgram within their own infrastructure. Specific configurations vary, but the Cloud Service Providers that are typically used include Amazon Web Services (AWS), Oracle Cloud Infrastructure (OCI), Microsoft Azure, and Google Cloud Platform (GCP).

If you are using a cloud provider not listed above, there may not yet be a custom guide for provisioning infrastructure for Deepgram onprem software. The details in this documentation will still be helpful as you apply them to the specific cloud service providers you are using. Please contact Support if you need additional assistance with your cloud provider of choice.

Bare Metal

Bare-metal servers are physical servers committed to a single user, providing direct access to all of its resources and eliminating any performance interference from shared users. They're ideal for high-performance computing, machine learning, big data analytics, and industries with high-security requirements.

Supporting Infrastructure

Operating System

Deepgram's onprem products run on the Linux operating system. The following distributions are officially supported:

  • Ubuntu Server 22.04/24.04
  • Red Hat Enterprise Linux (RHEL) 8/9
  • Oracle Linux 8/9

We recommend these distributions because we have tested our products most extensively in these systems.

Container Management System

Deepgram's software is intended to be run in containers managed by a container engine. Container management systems are used to isolate, secure, and deploy applications across many different host environments and operating systems. Containers are more lightweight than virtual machines but provide many of the same isolation benefits. Deepgram's container runtime of choice is Docker or Podman .

If your Linux distribution of choice does not support Docker, such as RHEL or Oracle Linux, you may consider Podman as your primary option for a container runtime.

Firewall

Container runtimes typically take care of networking concerns, so as long as your firewall is configured to allow your container runtimes, you should have no issues.

Deepgram server containers listen on port 8080 inside the container, unless otherwise configured.

You'll need to permit outbound HTTPS network traffic to license.deepgram.com on port 443 to verify with our servers that you have an active onprem licensing agreement. Additionally, you'll need access Quay, a container image repository platform, so you'll need to allow outbound traffic to their servers on port 443.


What’s Next

Learn more about provisioning a deployment environment with your container management system of choice.