Self-Managed Kubernetes
You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram's API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-premises with Kubernetes on a system with a Ubuntu operating system installed.
Prerequisites
Prior to deploying Kubernetes you will need to ensure you have a suitable environment per our Deployment Environments guide. You will also need access to the deepgram-self-hosted
Helm chart.
Terminology
If you are not overly familiar with Kubernetes, you should be aware of three main concepts:
- Node: A physical computer or virtual machine used to host workloads.
- Pod: A single container running on a node. One node can host many pods.
- Cluster: A group of nodes and their associated pods.
Additionally this guide refers frequently to kubectl
, the command line tool for interacting with the Kubernetes clusters, kubeadm
, the cluster administration tool, and the kubelet
node agent.
Installing Kubernetes
Managed Kubernetes
If you are operating in a VPC, you should use a managed Kubernetes service instead of installing your own. For example, you can use EKS in AWS as an alternative to the following manual installation.
Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned kubectl
, kubeadm
, and kubelet
.
Prior to installing Kubernetes you must disable Linux swap permanently. While sudo swapoff -a
will temporarily disable swap, you will need to make the change permenent in /etc/fstab
or systemd.swap
.
Install kubeadm, kubelet and kubctl
Update your package repositories and install dependencies for the Kubernetes repository:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
Download the public signing key from Google:
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
Note: Distributions prior to 22.04 may not have the /etc/apt/keyrings
folder. You can create this directory, making it world-readable and writeable only by admins.
Add the Kubernetes official repository:
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
Update packages and install Kubernetes tools:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
Kubernetes Versions
When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with
apt-mark
as follows:
sudo apt-mark hold kubelet kubeadm kubectl
Initializing a Cluster
In order to run nodes and pods you must first create a cluster. This is done using the kubeadm
command:
kubeadm init --ignore-preflight-errors Swap
Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. You can see configuration options for initialization here, including how set node taints.
Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You will also be presented with a kubeadm join
command which should be saved for later use joining worker nodes to the master node.
Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:
kubectl get pod -n kube-system
Deploying a Containerized Network Interface
By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:
kubectl apply -f <add-on.yaml>
As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml
A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster.
To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as Running
you may then join nodes to the cluster.
Joining Nodes
Once the master node is setup you can begin joining worker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using kubeadm
as follows:
kubeadm token create --print-join-command
After joining nodes to the cluster you can utilize the kubectl
command to verify the status of the cluster nodes:
kubectl get nodes
Metrics
Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using kubectl
:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
After deployment you may then query the compute utilization of nodes using the top
command from the CLI:
kubectl top nodes
Alternatively you can consume node metrics using your own metrics aggregation service poitned to the metrics API.
Configure Kubernetes Secrets
Deepgram strongly recommends following best practices for configuring Kubernetes Secrets. Please refer to Securing Your Cluster for more details.
The deepgram-self-hosted
Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram's container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.
-
Complete the Self Service Licensing & Credentials guide to generate distribution credentials and a self-hosted API key.
-
If using an external Secret store provider, configure cluster access to these two Secrets, naming them
dg-regcred
(distribution credentials) anddg-self-hosted-api-key
. -
If not using an external Secret store provider, create the Secrets manually in your cluster.
-
Using the distribution credentials username and password generated in the Deepgram Console, create a Kubernetes Secret named
dg-regcred
.kubectl create secret docker-registry dg-regcred \ --docker-server=quay.io \ --docker-username=QUAY_DG_USER \ --docker-password=QUAY_DG_PASSWORD
-
Create a Kubernetes Secret named
dg-self-hosted-api-key
to store your self-hosted API key. /kubectl create secret generic dg-self-hosted-api-key \ --from-literal=DEEPGRAM_API_KEY='YOUR_API_KEY_HERE'
Replace the placeholder
YOUR_API_KEY_HERE
with the Deepgram API key you generated in the Self Service Licensing & Credentials guide.
-
Download Models to your K8s Node
Your Deepgram Account Representative should provide you with download links to at least one voice AI model. Copy the provided model files into a dedicated directory on the host machine.
mkdir deepgram-models
cd deepgram-models
wget DOWNLOAD_LINK_TO_DEEPGRAM_MODEL
Create a local PersistentVolume
in your cluster using this official Kubernetes guide, and set the spec.local.path
to the absolute path of the deepgram-models
directory you just created.
Deploy Deepgram
Deepgram maintains the official deepgram-self-hosted
Helm Chart. You can reference the source and Artifact Hub listing for more details. We'll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.
-
helm repo add deepgram https://deepgram.github.io/self-hosted-resources helm repo update
-
Download a
values.yaml
template from Deepgram's self-hosted resources. For example, here is a template for a basic setup with a self-managed cluster. -
In your
values.yaml
, modify thescaling.replicas.{api,engine}
as desired.
-
In your
values.yaml
file, insert the name of your localPersistentVolume
you created in the previous section.engine: modelManager: volumes: customVolumeClaim: enabled: true name: deepgram-models-pv # Replace with the name of the local PersistentVolume you have created modelsDirectory: "/"
-
Install the Helm Chart with your
values.yaml
file.helm install deepgram deepgram/deepgram-self-hosted \ -f my-values.yaml \ --namespace dg-self-hosted \ --atomic \ --timeout 1h # Monitor the installation in a separate shell watch kubectl get all
Pod Scheduling Failures Limits
Resource limits, taints, and other constraints may limit Pod scheduling. If a Pod is not able to be scheduled, you can see its status and a list of associated events with
kubectl describe pod <pod-name>
.
Test Your Deepgram Setup with a Sample Request
Test your environment and container setup with a local file.
- Get the name of one of the Deepgram API Pods.
API_POD_NAME=$( kubectl get pods \ --selector app=deepgram-api \ --output jsonpath='{.items[0].metadata.name}' \ --no-headers )
- Launch an ephemeral container to send your test request from.
kubectl debug $API_POD_NAME \ -it \ --image=curlimages/curl \ -- /bin/sh
- Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
wget https://dpgr.am/bueller.wav
- Send your audio file to your local Deepgram setup for transcription.
curl \ -X POST \ --data-binary @bueller.wav \ "http://deepgram-api-external.dg-self-hosted.svc.cluster.local:8080/v1/listen?model=nova-2&smart_format=true"
If needed, adjust pieces of the above command:
- the query parameters to match the directions from your Deepgram Account Representative
- the service name (
deepgram-api-external
) - the namespace (
dg-self-hosted
)
You should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!
Next Steps
Your Deepgram services are accessible within your cluster via the deepgram-api-external
Service that was created by the Helm Chart.
Updated 5 months ago
Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.