You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram's API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-prem with Kubernetes on an system with a Ubuntu operating system installed.
Prior to deploying Kubernetes you will need to ensure you have a suitable environment per our Deployment Environments guide. You will also require a set of Deepgram specific Kubernetes deployment files that our Support team can provide you with.
If you are not overly familiar with Kubernetes, you should be aware of three main concepts:
- Node: A physical computer or virtual machine used to host workloads.
- Pod: A single container running on a node. One node can host many pods.
- Cluster: A group of nodes and their associated pods.
Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned
Prior to installing Kubernetes you must disable Linux swap permanently. While
sudo swapoff -a will temporarily disable swap, you will need to make the change permenent in
Update your package repositories and install dependencies for the Kubernetes repository:
sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl
Download the public signing key from Google:
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
Note: Distributions prior to 22.04 may not have the
/etc/apt/keyrings folder. You can create this directory, making it world-readable and writeable only by admins.
Add the Kubernetes official repository:
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
Update packages and install Kubernetes tools:
sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl
When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with
sudo apt-mark hold kubelet kubeadm kubectl
In order to run nodes and pods you must first create a cluster. This is done using the
kubeadm init --ignore-preflight-errors Swap
Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
You will also be presented with a
kubeadm join command which should be saved for later use joining worker nodes to the master node.
Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:
kubectl get pod -n kube-system
By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:
kubectl apply -f <add-on.yaml>
As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml
A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster.
To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as
Running you may then join nodes to the cluster.
Once the master node is setup you can begin joining woker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using
kubeadm as follows:
kubeadm token create --print-join-command
After joining nodes to the cluster you can utilize the
kubectl command to verify the status of the cluster nodes:
kubectl get nodes
Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
After deployment you may then query the compute utilization of nodes using the
top command from the CLI:
kubectl top nodes
Alternatively you can consume node metrics using your own metrics aggregation service poitned to the metrics API.
Your Deepgram Account Representative should provide you with download links to customized configuration files to be used with your Kubernetes deployment. With these files you will need to create configuration maps using the engine.toml and api.toml files:
kubectl create configmap engine --from-file=/path/to/engine.toml kubectl create configmap api --from-file=/path/to/api.toml
You'll have to delete, then recreate, configuration maps each time a change is made to a .toml file.
Use the customized YAML files provided by your Deepgram Account Representative to deploy Engine and API pods:
kubectl apply -f engine.yaml kubectl apply -f api.yaml
You can check the status of each deployment using
kubectl get pods
The status will show
Running if successful. If this is any other value, you can further diagnose the issue with the command
kubectl describe pods <pod-name> or
kubectl logs <pod-name>. Running the apply command again will apply the changes you made to the deployment files.
Your Deepgram Account Representative should provide you with download links to at least one pre-trained AutoML™ AI model for testing purposes. Once the API and Engine pods are up and running, copy the provided model files into the directory in which you decided to place the models:
kubectl cp model.dg <engine-pod-name>:/models
The newly-added models will appear within a minute or so. To validate that they were added, you can query the /v1/models endpoint.
This method is primarily for proof-of-concept. In production, Kubernetes lets you obtain these files automatically by creating Volumes.
Updated 30 days ago