Deepgram On-Premises deployment supports bare-metal servers or a Virtual Private Cloud provider such as AWS, with either NVIDIA GPU-acceleration or traditional CPU compute.
The typical architecture for a basic on-premises (on-prem) deployment for the Deepgram Engine and the Deepgram API includes:
- A single server hosted on either a:
- Virtual Private Cloud (VPC)
- Dedicated Cloud (DC)
- Bare-metal server (bare metal)
- A customer-provided proxy (for example, NGINX, HAProxy, Apache) to handle TLS termination
- A single NVIDIA GPU
- Deepgram-delivered models and configuration files appropriate for your domain
Deepgram only supports NVIDIA GPUs. We recommend modern NVIDIA GPUs that are compatible with recent NVIDIA drivers and are accessible by the NVIDIA-Docker runtime.
Deepgram services are compatible with Linux on x86-64. If you want to run Deepgram services on a different processer/architecture, please contact Support.
A system that meets these baseline requirements will provide the maximum speedup possible with Deepgram models, provided you scale according to the number of requests you are servicing.
Please contact Support so we can work with you to create a customized hardware recommendation.
For each machine running an Engine container, we recommend a machine with the following minimum specifications:
- A NVIDIA GPU
- T4 model or better
- 16 GB GPU RAM
- 4 CPU cores
- 32 GB system RAM
- 48 GB storage
- We recommend more if you intend on deploying multiple models, and start at 100 GiB for a DGTools model training environment.
You may be able to run Deepgram services with only 16 GB of system RAM if you are only using a small number of models. Please contact Support to see if this is possible for your use case.
You may run a Deepgram API container on the same machine as a Deepgram Engine container, with specifications as defined in the section above.
If you have a more complex setup with multiple API and Engine containers, you may want to run API containers on separate machines to properly load-balance across your Engine containers. If this is the case, for each machine running an API container, we recommend the following minimum specifications:
- 2 CPU cores
- 4 GB system RAM
If you are using the optional License Proxy container in your environment, see the License Proxy guide for hardware requirements.
Cloud-hosted on-premises (on-prem) deployments such as a Virtual Private Cloud are the most common type of deployment performed by customers who want to leverage Deepgram within their own infrastructure. Specific configurations vary, but the Cloud Service Providers that are typically used include Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). This guide details steps to set up a deployment to the cloud on an AWS instance.
If you are using a cloud provider other than AWS, the details in this guide will still be helpful as you apply them to the specific cloud service providers you are using.
Bare-metal servers are physical servers committed to a single user, providing direct access to all of its resources and eliminating any performance interference from shared users. They're ideal for high-performance computing, machine learning, big data analytics, and industries with high-security requirements.
Deepgram's software is intended to be run in containers managed by a container engine. Containerization platforms are used to isolate, secure, and deploy applications across many different host environments and operating systems. Containers are more lightweight than virtual machines but provide many of the same isolation benefits. Deepgram's container engine of choice is Docker.
Although Docker can be run from a variety of host operating systems, we have several preferred Linux distributions. We recommend these distributions because we have tested our products most extensively in these systems.
Recommended Linux Distributions
- Ubuntu Server 22.04 LTS
- Amazon Machine Image
- Red Hat Enterprise Linux (RHEL) 8
- Amazon Machine Image
- Red Hat Enterprise Linux (RHEL) 9
- Amazon Machine Image
If your Linux distribution of choice does not support Docker, such as RHEL, you may consider Podman as an alternative containerization platform.
Container engines typically take care of networking concerns, so as long as your firewall is configured to allow your container engine, you should have no issues.
Deepgram server containers listen on port
8080 inside the container, unless otherwise configured.
You'll need to permit outbound HTTPS network traffic to
license.deepgram.com on port
443 to verify with our servers that you have an active on-prem licensing agreement. Additionally, you'll need access Quay, a container image repository platform, so you'll need to allow outbound traffic to their servers on port
Updated 2 months ago