For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram’s API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-premises with Kubernetes on a system with a Ubuntu operating system installed.
If you are operating in a VPC, you should use a managed Kubernetes service instead of installing your own. For example, you can use EKS in AWS as an alternative to the following manual installation.
Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned kubectl, kubeadm, and kubelet. Prior to installing Kubernetes you must disable Linux swap permanently. While sudo swapoff -a will temporarily disable swap, you will need to make the change permenent in /etc/fstab or systemd.swap.
Install kubeadm, kubelet and kubctl
Update your package repositories and install dependencies for the Kubernetes repository:
Note: Distributions prior to 22.04 may not have the /etc/apt/keyrings folder. You can create this directory, making it world-readable and writeable only by admins. Add the Kubernetes official repository:
Shell
$
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
Update packages and install Kubernetes tools:
Shell
$
sudo apt-get update
$
sudo apt-get install -y kubelet kubeadm kubectl
$
sudo apt-mark hold kubelet kubeadm kubectl
Kubernetes Versions
When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with apt-mark as follows:
sudo apt-mark hold kubelet kubeadm kubectl
Initializing a Cluster
In order to run nodes and pods you must first create a cluster. This is done using the kubeadm command:
Shell
$
kubeadm init --ignore-preflight-errors Swap
Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. You can see configuration options for initialization here, including how set node taints.
Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):
You will also be presented with a kubeadm join command which should be saved for later use joining worker nodes to the master node. Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:
Shell
$
kubectl get pod -n kube-system
Deploying a Containerized Network Interface
By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:
Shell
$
kubectl apply -f <add-on.yaml>
As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:
A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster. To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as Running you may then join nodes to the cluster.
Joining Nodes
Once the master node is setup you can begin joining worker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using kubeadm as follows:
Shell
$
kubeadm token create --print-join-command
After joining nodes to the cluster you can utilize the kubectl command to verify the status of the cluster nodes:
Shell
$
kubectl get nodes
Metrics
Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using kubectl:
The deepgram-self-hosted Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram’s container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.
If using an external Secret store provider, configure cluster access to these two Secrets, naming them dg-regcred (distribution credentials) and dg-self-hosted-api-key.
If not using an external Secret store provider, create the Secrets manually in your cluster.
Using the distribution credentials username and password generated in the Deepgram Console, create a Kubernetes Secret named dg-regcred.
Your Deepgram Account Representative should provide you with download links to at least one voice AI model. Copy the provided model files into a dedicated directory on the host machine.
Shell
$
mkdir deepgram-models
$
cd deepgram-models
$
wget DOWNLOAD_LINK_TO_DEEPGRAM_MODEL
Create a local PersistentVolume in your cluster using this official Kubernetes guide, and set the spec.local.path to the absolute path of the deepgram-models directory you just created.
Deploy Deepgram
Deepgram maintains the official deepgram-self-hosted Helm Chart. You can reference the source and Artifact Hub listing for more details. We’ll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.
Resource limits, taints, and other constraints may limit Pod scheduling. If a Pod is not able to be scheduled, you can see its status and a list of associated events with kubectl describe pod <pod-name>.
Test Your Deepgram Setup with a Sample Request
Test your environment and container setup with a local file.
Get the name of one of the Deepgram API Pods.
Shell
$
API_POD_NAME=$(
>
kubectl get pods \
>
--selector app=deepgram-api \
>
--output jsonpath='{.items[0].metadata.name}' \
>
--no-headers
>
)
Launch an ephemeral container to send your test request from.
Shell
$
kubectl debug $API_POD_NAME \
>
-it \
>
--image=curlimages/curl \
>
-- /bin/sh
Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
Shell
$
wget https://dpgr.am/bueller.wav
Send your audio file to your local Deepgram setup for transcription.
the query parameters to match the directions from your Deepgram Account Representative
the service name deepgram-api-external
the namespace dg-self-hosted
You should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!
Next Steps
Your Deepgram services are accessible within your cluster via the deepgram-api-external Service that was created by the Helm Chart.
What’s Next
Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.