Designing Your Architecture

Before you deploy Deepgram, you’ll need to make effective design decisions about the components of your system, their relationships, and the interactions between components. Ideally, your architecture will meet your business needs, optimize both performance and security, and provide a strong technical foundation for future growth.

Required Components

  • onprem-api is responsible for handling ASR requests and routing them to Engine
  • onprem-engine is responsible for processing ASR requests with ASR models
  • onprem-license-proxy is responsible for assuring up-time for on-premises deployments
  • Access to the internet and the Deepgram Cloud is required for online on-prem deployments.

Basic Deployment

The typical architecture for a basic on-premises (on-prem) deployment for the Deepgram Engine and the Deepgram API includes:

  • A single server hosted on either a:
    • Virtual Private Cloud (VPC)
    • Dedicated Cloud (DC)
    • Bare-metal server (bare metal)
  • A customer-provided proxy (for example, NGINX, HAProxy, Apache) to handle TLS termination
  • A single NVIDIA GPU
  • Deepgram-delivered models and configuration files appropriate for your domain


Deepgram only supports NVIDIA GPUs. We recommend modern NVIDIA GPUs that are compatible with recent NVIDIA drivers and are accessible by the NVIDIA-Docker runtime.

For inference, which refers to performing speech analysis on trained speech models, we recommend:

  • a T4 or better NVIDIA GPU with at least:
    • 16 GB GPU RAM
    • 4 CPU cores
    • 32 GB RAM
    • 32 GB storage

A system of this sort will typically provide an average of 100x real-time speedup, depending on file length. An Amazon EC2 g4dn.2xlarge instance works well as a cost-effective baseline deployment.

Please contact Support, so we can work with you to create a customized hardware recommendation.