You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram's API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-prem with Kubernetes on an system with a Ubuntu operating system installed.


Prior to deploying Kubernetes you will need to ensure you have a suitable environment per our Deployment Environments guide. You will also require a set of Deepgram specific Kubernetes deployment files that our Support team can provide you with.​


​If you are not overly familiar with Kubernetes, you should be aware of three main concepts:

  • Node: A physical computer or virtual machine used to host workloads.
  • Pod: A single container running on a node. One node can host many pods.
  • Cluster: A group of nodes and their associated pods. ​  

Additionally this guide refers frequently to kubectl the command line tool for interacting with the Kubernetes clusters, and kubeadm the cluster administration tool, and kubelet node agent.

Installing Kubernetes


Managed Kubernetes

If you are operating in a VPC, you may want to use a managed Kubernetes service instead of installing your own. For example, you can use EKS in AWS as an alternative to the following manual installation.

​Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned kubectl, kubeadm, and kubelet.

Prior to installing Kubernetes you must disable Linux swap permanently. While sudo swapoff -a will temporarily disable swap, you will need to make the change permenent in /etc/fstab or systemd.swap.

Install kubeadm, kubelet and kubctl

Update your package repositories and install dependencies for the Kubernetes repository:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

Download the public signing key from Google:

curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg

Note: Distributions prior to 22.04 may not have the /etc/apt/keyrings folder. You can create this directory, making it world-readable and writeable only by admins.

Add the Kubernetes official repository:

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

Update packages and install Kubernetes tools:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl


Kubernetes Versions

When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with apt-mark as follows:

sudo apt-mark hold kubelet kubeadm kubectl

Initializing a Cluster

In order to run nodes and pods you must first create a cluster. This is done using the kubeadm command:

kubeadm init --ignore-preflight-errors Swap

Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You will also be presented with a kubeadm join command which should be saved for later use joining worker nodes to the master node.

Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:

kubectl get pod -n kube-system

Deploying a Containerized Network Interface

By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:

kubectl apply -f <add-on.yaml>

As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster.

To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as Running you may then join nodes to the cluster.

Joining Nodes

Once the master node is setup you can begin joining woker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using kubeadm as follows:

kubeadm token create --print-join-command

After joining nodes to the cluster you can utilize the kubectl command to verify the status of the cluster nodes:

kubectl get nodes


Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using kubectl:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

After deployment you may then query the compute utilization of nodes using the top command from the CLI:

kubectl top nodes

Alternatively you can consume node metrics using your own metrics aggregation service poitned to the metrics API.

Deploying Deepgram

Your Deepgram Account Representative should provide you with download links to customized configuration files to be used with your Kubernetes deployment. These will include Kubernetes manifest files describing config maps, deployments, services, persistent volumes, and other needed resources to setup your environment.

Downloading Model Files to your Kubernetes Node

​Your Deepgram Account Representative should provide you with download links to at least one language AI model. Copy the provided model files into a dedicated directory on the host machine.

mkdir deepgram-models
cd deepgram-models

Installing the Onprem API Key

The provided manifest files make use of a Kubernetes secret onprem-api-key for registering each container with the Deepgram license server. This can be created as follows, replacing <id> with the appropriate API key:

kubectl create secret generic onprem-api-key \

Setup Quay Image Repository Access

You will need a set of distribution credentials in order to download the requisite container images. See Deepgram's self service credential documentation for details on generating these credentials.

Once you have your creds, you'll need to import them into your cluster. You can do this by importing Docker config files, or by manually setting each individual key.

docker login quay.io  

# Using Docker config files
kubectl create secret generic dg-regcred \
  --from-file=.dockerconfigjson=~/.docker/config.json \

# Manually setting needed keys
export DOCKER_USER=<quay username>  
export DOCKER_PASSWORD=<quay token>  
export DOCKER_EMAIL=<email address>

kubectl create secret docker-registry dg-regcred \
 --docker-server=$DOCKER_REGISTRY_SERVER \
 --docker-username=$DOCKER_USER \
 --docker-password=$DOCKER_PASSWORD \

Applying Manifest Files

Make sure to edit any placeholder values present in your manifest files. For example, the default persistent volume file supplied by Deepgram with have a /path/to/models placeholder path, which you should change to point to your Deepgram models directory created in the previous section.

Then, apply the manifest files.

kubectl apply -f ./*

​You can check the status of each deployment using kubectl:

kubectl get pods

The status will show Running if successful. Pods may take a few minutes after startup to switch to Ready status. If this is any other value, you can further diagnose the issue with the command kubectl describe pods <pod-name> or kubectl logs <pod-name>. Running the apply command again will apply the changes you made to the deployment files.