Self-Managed Kubernetes

You can deploy Deepgram using Kubernetes which will provide a scalable instance of Deepgram's API and Engine services running on your own hardware or in your Kubernetes cloud environment. In this guide, we will look at how to deploy Deepgram on-premises with Kubernetes on a system with a Ubuntu operating system installed.

Prerequisites

Prior to deploying Kubernetes you will need to ensure you have a suitable environment per our Deployment Environments guide. You will also need access to the deepgram-self-hosted Helm chart.

Terminology

​If you are not overly familiar with Kubernetes, you should be aware of three main concepts:

  • Node: A physical computer or virtual machine used to host workloads.
  • Pod: A single container running on a node. One node can host many pods.
  • Cluster: A group of nodes and their associated pods. ​ 

Additionally this guide refers frequently to kubectl, the command line tool for interacting with the Kubernetes clusters, kubeadm, the cluster administration tool, and the kubelet node agent.

Installing Kubernetes

📘

Managed Kubernetes

If you are operating in a VPC, you should use a managed Kubernetes service instead of installing your own. For example, you can use EKS in AWS as an alternative to the following manual installation.

​Kubernetes consists of several components distributed as binaries or container images including an API server for cluster management, proxy server, scheduler, controllers, etc. These components are served from registry.k8s.io, and you will require several helper tools to get up and running including the aforementioned kubectl, kubeadm, and kubelet.

Prior to installing Kubernetes you must disable Linux swap permanently. While sudo swapoff -a will temporarily disable swap, you will need to make the change permenent in /etc/fstab or systemd.swap.

Install kubeadm, kubelet and kubctl

Update your package repositories and install dependencies for the Kubernetes repository:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl


Download the public signing key from Google:

curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg

Note: Distributions prior to 22.04 may not have the /etc/apt/keyrings folder. You can create this directory, making it world-readable and writeable only by admins.

Add the Kubernetes official repository:

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list


Update packages and install Kubernetes tools:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

🚧

Kubernetes Versions

When updating tooling you must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.27 client can communicate with v1.26, v1.27, and v1.28 control planes. You must keep all tooling versions in sync manually. If you wish to pin the versions you can do so with apt-mark as follows:

sudo apt-mark hold kubelet kubeadm kubectl

Initializing a Cluster

In order to run nodes and pods you must first create a cluster. This is done using the kubeadm command:

kubeadm init --ignore-preflight-errors Swap

Kubeadm will run verification checks and report any errors, then it will download the required containerized components and initialize a control-plane. You can see configuration options for initialization here, including how set node taints.

Once the control-plane is initialized you will receive instructions to store the cluster configuration and deploy a pod network. Examples below (instructions may differ based on your system):

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You will also be presented with a kubeadm join command which should be saved for later use joining worker nodes to the master node.

Upon completion you should now be able to query your control-plan and see the standard Kubernetes pods running:

kubectl get pod -n kube-system

Deploying a Containerized Network Interface

By default Kubernetes does not deploy a CNI for pod communication. Before cluster DNS will start and pods be able to communicate you must install an add-on for the CNI you wish to deploy in your cluster as follows:

kubectl apply -f <add-on.yaml>

As an example, if you were to deploy the Calico network in your cluster you would install the add-on as follows:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

A comprehensive though not exhaustive list of common network add-ons is available in the official Kuberenetes Networking and Network Policy documentation. You may utilize only a single CNI per cluster.

To verify the network is up and running you can check the CoreDNS pod status. When the CoreDNS pod state shows as Running you may then join nodes to the cluster.

Joining Nodes

Once the master node is setup you can begin joining worker nodes to the cluster. If you copied the join command output when the cluster was initialized this can be used on each worker node directly. In the event that you did not save the join command you may recover it using kubeadm as follows:

kubeadm token create --print-join-command

After joining nodes to the cluster you can utilize the kubectl command to verify the status of the cluster nodes:

kubectl get nodes

Metrics

Kubernets supports metric aggregates from nodes within the cluster, however this is not setup by default upon cluster initialization. If you wish to utilize the Kubernetes metrics server you may deploy the latest version using kubectl:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml


After deployment you may then query the compute utilization of nodes using the top command from the CLI:

kubectl top nodes


Alternatively you can consume node metrics using your own metrics aggregation service poitned to the metrics API.

Configure Kubernetes Secrets

Deepgram strongly recommends following best practices for configuring Kubernetes Secrets. Please refer to Securing Your Cluster for more details.

The deepgram-self-hosted Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram's container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.

  1. Complete the Self Service Licensing & Credentials guide to generate distribution credentials and a self-hosted API key.

  2. If using an external Secret store provider, configure cluster access to these two Secrets, naming them dg-regcred (distribution credentials) and dg-self-hosted-api-key.

  3. If not using an external Secret store provider, create the Secrets manually in your cluster.

    1. Using the distribution credentials username and password generated in the Deepgram Console, create a Kubernetes Secret named dg-regcred.

      kubectl create secret docker-registry dg-regcred \
          --docker-server=quay.io \
          --docker-username=QUAY_DG_USER \
          --docker-password=QUAY_DG_PASSWORD
      
    2. Create a Kubernetes Secret named dg-self-hosted-api-key to store your self-hosted API key. /

      kubectl create secret generic dg-self-hosted-api-key \
          --from-literal=DEEPGRAM_API_KEY='YOUR_API_KEY_HERE'
      

      💻

      Replace the placeholder YOUR_API_KEY_HERE with the Deepgram API key you generated in the Self Service Licensing & Credentials guide.

Download Models to your K8s Node

​Your Deepgram Account Representative should provide you with download links to at least one voice AI model. Copy the provided model files into a dedicated directory on the host machine.

mkdir deepgram-models
cd deepgram-models
wget DOWNLOAD_LINK_TO_DEEPGRAM_MODEL

Create a local PersistentVolume in your cluster using this official Kubernetes guide, and set the spec.local.path to the absolute path of the deepgram-models directory you just created.

Deploy Deepgram

Deepgram maintains the official deepgram-self-hosted Helm Chart. You can reference the source and Artifact Hub listing for more details. We'll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.

  1. Fetch the repository info.

    helm repo add deepgram https://deepgram.github.io/self-hosted-resources
    helm repo update
    
  2. Download a values.yaml template from Deepgram's self-hosted resources. For example, here is a template for a basic setup with a self-managed cluster.

  3. In your values.yaml, modify the scaling.replicas.{api,engine} as desired.

  1. In your values.yaml file, insert the name of your local PersistentVolume you created in the previous section.

    engine:
      modelManager:
        volumes:
          customVolumeClaim:
            enabled: true
            name: deepgram-models-pv # Replace with the name of the local PersistentVolume you have created
            modelsDirectory: "/"
    
  2. Install the Helm Chart with your values.yaml file.

    helm install deepgram deepgram/deepgram-self-hosted \
        -f my-values.yaml \
        --namespace dg-self-hosted \
        --atomic \
        --timeout 1h 
    # Monitor the installation in a separate shell
    watch kubectl get all
    

    🚧

    Pod Scheduling Failures Limits

    Resource limits, taints, and other constraints may limit Pod scheduling. If a Pod is not able to be scheduled, you can see its status and a list of associated events with kubectl describe pod <pod-name>.

Test Your Deepgram Setup with a Sample Request

Test your environment and container setup with a local file.

  1. Get the name of one of the Deepgram API Pods.
    API_POD_NAME=$(
        kubectl get pods \
            --selector app=deepgram-api \
            --output jsonpath='{.items[0].metadata.name}' \
            --no-headers
    )
    
  2. Launch an ephemeral container to send your test request from.
    kubectl debug $API_POD_NAME \
        -it \
        --image=curlimages/curl \
        -- /bin/sh
    
  3. Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
    wget https://dpgr.am/bueller.wav
    
  4. Send your audio file to your local Deepgram setup for transcription.
    curl \
        -X POST \
        --data-binary @bueller.wav \
        "http://deepgram-api-external.dg-self-hosted.svc.cluster.local:8080/v1/listen?model=nova-2&smart_format=true"
    

    📘

    If needed, adjust pieces of the above command:

    • the query parameters to match the directions from your Deepgram Account Representative
    • the service name (deepgram-api-external)
    • the namespace (dg-self-hosted)

You should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!

Next Steps

Your Deepgram services are accessible within your cluster via the deepgram-api-external Service that was created by the Helm Chart.


What’s Next

Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.