You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram's API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-prem with Kubernetes on an system with a Ubuntu operating system installed.
Prior to deploying Kubernetes you will need to ensure you have a suitable environment per our Deployment Environments guide. You will also require a set of Deepgram specific Kubernetes deployment files that our Support team can provide you with.
If you are not overly familiar with Kubernetes, you should be aware of three main concepts:
- Node: A physical computer or virtual machine used to host workloads.
- Pod: A single container running on a node. One node can host many pods.
- Cluster: A group of nodes and their associated pods.
If you are operating in a VPC, you may want to use a managed Kubernetes service instead of installing your own. For example, you can use EKS in AWS as an alternative to the following manual installation.
Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned
Prior to installing Kubernetes you must disable Linux swap permanently. While
sudo swapoff -a will temporarily disable swap, you will need to make the change permenent in
Update your package repositories and install dependencies for the Kubernetes repository:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
Download the public signing key from Google:
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
Note: Distributions prior to 22.04 may not have the
/etc/apt/keyrings folder. You can create this directory, making it world-readable and writeable only by admins.
Add the Kubernetes official repository:
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
Update packages and install Kubernetes tools:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with
sudo apt-mark hold kubelet kubeadm kubectl
In order to run nodes and pods you must first create a cluster. This is done using the
kubeadm init --ignore-preflight-errors Swap
Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You will also be presented with a
kubeadm join command which should be saved for later use joining worker nodes to the master node.
Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:
kubectl get pod -n kube-system
By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:
kubectl apply -f <add-on.yaml>
As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml
A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster.
To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as
Running you may then join nodes to the cluster.
Once the master node is setup you can begin joining woker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using
kubeadm as follows:
kubeadm token create --print-join-command
After joining nodes to the cluster you can utilize the
kubectl command to verify the status of the cluster nodes:
kubectl get nodes
Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
After deployment you may then query the compute utilization of nodes using the
top command from the CLI:
kubectl top nodes
Alternatively you can consume node metrics using your own metrics aggregation service poitned to the metrics API.
Your Deepgram Account Representative should provide you with download links to customized configuration files to be used with your Kubernetes deployment. These will include Kubernetes manifest files describing config maps, deployments, services, persistent volumes, and other needed resources to setup your environment.
Your Deepgram Account Representative should provide you with download links to at least one language AI model. Copy the provided model files into a dedicated directory on the host machine.
The provided manifest files make use of a Kubernetes secret
onprem-api-key for registering each container with the Deepgram license server. This can be created as follows, replacing
<id> with the appropriate API key:
kubectl create secret generic onprem-api-key \
You will need a set of distribution credentials in order to download the requisite container images. See Deepgram's self service credential documentation for details on generating these credentials.
Once you have your creds, you'll need to import them into your cluster. You can do this by importing Docker config files, or by manually setting each individual key.
docker login quay.io
# Using Docker config files
kubectl create secret generic dg-regcred \
# Manually setting needed keys
export DOCKER_USER=<quay username>
export DOCKER_PASSWORD=<quay token>
export DOCKER_EMAIL=<email address>
kubectl create secret docker-registry dg-regcred \
Make sure to edit any placeholder values present in your manifest files. For example, the default persistent volume file supplied by Deepgram with have a
/path/to/models placeholder path, which you should change to point to your Deepgram models directory created in the previous section.
Then, apply the manifest files.
kubectl apply -f ./*
You can check the status of each deployment using
kubectl get pods
The status will show
Running if successful. Pods may take a few minutes after startup to switch to
Ready status. If this is any other value, you can further diagnose the issue with the command
kubectl describe pods <pod-name> or
kubectl logs <pod-name>. Running the apply command again will apply the changes you made to the deployment files.
Updated 3 months ago