You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram's API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-prem with Kubernetes on an system with a Ubuntu operating system installed.


Prior to deploying Kubernetes you will need to ensure you have a suitable environment per our Deployment Environments guide. You will also require a set of Deepgram specific Kubernetes deployment files that our Support team can provide you with.​


​If you are not overly familiar with Kubernetes, you should be aware of three main concepts:

  • Node: A physical computer or virtual machine used to host workloads.
  • Pod: A single container running on a node. One node can host many pods.
  • Cluster: A group of nodes and their associated pods. ​

Additionally this guide refers frequently to kubectl the command line tool for interacting with the Kubernetes clusters, and kubeadm the cluster administration tool, and kubelet node agent.

Installing Kubernetes

​Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned kubectl, kubeadm, and kubelet.

Prior to installing Kubernetes you must disable Linux swap permanently. While sudo swapoff -a will temporarily disable swap, you will need to make the change permenent in /etc/fstab or systemd.swap.

Install kubeadm, kubelet and kubctl

Update your package repositories and install dependencies for the Kubernetes repository:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

Download the public signing key from Google:

curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg

Note: Distributions prior to 22.04 may not have the /etc/apt/keyrings folder. You can create this directory, making it world-readable and writeable only by admins.

Add the Kubernetes official repository:

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

Update packages and install Kubernetes tools:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl


Kubernetes Versions

When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with apt-mark as follows:

sudo apt-mark hold kubelet kubeadm kubectl

Initializing a Cluster

In order to run nodes and pods you must first create a cluster. This is done using the kubeadm command:

kubeadm init --ignore-preflight-errors Swap

Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You will also be presented with a kubeadm join command which should be saved for later use joining worker nodes to the master node.

Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:

kubectl get pod -n kube-system

Deploying a Containerized Network Interface

By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:

kubectl apply -f <add-on.yaml>

As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster.

To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as Running you may then join nodes to the cluster.

Joining Nodes

Once the master node is setup you can begin joining woker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using kubeadm as follows:

kubeadm token create --print-join-command

After joining nodes to the cluster you can utilize the kubectl command to verify the status of the cluster nodes:

kubectl get nodes


Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using kubectl:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

After deployment you may then query the compute utilization of nodes using the top command from the CLI:

kubectl top nodes

Alternatively you can consume node metrics using your own metrics aggregation service poitned to the metrics API.

Deploying Deepgram

Your Deepgram Account Representative should provide you with download links to customized configuration files to be used with your Kubernetes deployment. With these files you will need to create configuration maps using the engine.toml and api.toml files:

kubectl create configmap engine --from-file=/path/to/engine.toml
kubectl create configmap api --from-file=/path/to/api.toml  


You'll have to delete, then recreate, configuration maps each time a change is made to a .toml file.

Use the customized YAML files provided by your Deepgram Account Representative to deploy Engine and API pods:

kubectl apply -f engine.yaml
kubectl apply -f api.yaml

You can check the status of each deployment using kubectl:

kubectl get pods

The status will show Running if successful. If this is any other value, you can further diagnose the issue with the command kubectl describe pods <pod-name> or kubectl logs <pod-name>. Running the apply command again will apply the changes you made to the deployment files.

Getting Model Files

​Your Deepgram Account Representative should provide you with download links to at least one pre-trained AutoML™ AI model for testing purposes. Once the API and Engine pods are up and running, copy the provided model files into the directory in which you decided to place the models:

kubectl cp model.dg <engine-pod-name>:/models

The newly-added models will appear within a minute or so. To validate that they were added, you can query the /v1/models endpoint.


This method is primarily for proof-of-concept. In production, Kubernetes lets you obtain these files automatically by creating Volumes.