Google Cloud Platform

With Kubernetes

Deploying Deepgram on Google Cloud Platform (GCP) requires some preparation. In this section, you will learn how to provision a managed Kubernetes Cluster where you will deploy Deepgram products. You will need to perform some of these steps in the Google Cloud Console and some in your local terminal.

Prerequisites

Make sure you have completed the requirements in the Onprem Introduction.

GPU availability has been extremely limited across cloud providers, including GCP. You may need to request a GPU quota if you are not able to provision spot GPU instances in your node pools.

gcloud CLI

The gcloud CLI provides programmatic access to manage your GCP services. Certain steps in this guide are enabled by this tool, although many of the same actions can be performed manually in the Google Cloud Console.

  1. Follow the installation guide to install the CLI locally.
  2. Once installed, run gcloud init to configure the CLI with access to your GCP account and project.

📘

Choosing a Region

The templates and steps in this guide provision resources in the GCP us-west1 region.

If you would like to deploy to a different region, make sure to adjust templates and steps in this guide accordingly.

Cluster Management with gcloud container clusters

The gcloud container clusters command group allows you to create and manage GKE clusters.

Certain steps in this guide use these commands, although many of the same actions can be performed manually in the Google Cloud Console.

Kubernetes Packages with helm

Helm is the package manager for Kubernetes. A package in Kubernetes is defined by a Helm Chart, which helps you define, install, and upgrade even the most complex Kubernetes application.

We use Helm to install several components in this guide. See the installation guide for details on how to install locally.

Creating a Cluster

Google Kubernetes Engine (GKE) is a managed Kubernetes service to run Kubernetes in GCP. In the cloud, GKE automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks.

  1. Create a new GKE cluster with gcloud, and get the zones where your cluster is created.

    CLUSTER_NAME=deepgram-self-hosted
    CLUSTER_LOCATION=us-west1
    gcloud container clusters create $CLUSTER_NAME \
        --location $CLUSTER_LOCATION \
        --num-nodes 1 \
        --enable-autoscaling \
        --machine-type n1-standard-2 \
        --addons=GcePersistentDiskCsiDriver
        
    CLUSTER_ZONES=$(
         gcloud container clusters describe $CLUSTER_NAME \
             --location $CLUSTER_LOCATION \
             --format="value(locations.join(','))"
    )
    ENGINE_NP_ZONE=$(echo "$CLUSTER_ZONES" | cut -d',' -f1)
    
  2. Create separate node pools for each Deepgram component (API, Engine, License Proxy). Adjust the machine types and node counts according to your needs. You may wish to consult your Deepgram Account Representative in planning your cluster's capacity.

    📘

    num-nodes Default Behavior

    num-nodes configures the number of nodes in the node pool in each of the cluster's zones. If your cluster is configured in 3 zones, settingnum-nodes to 1 will result in 1 node per zone, or 3 nodes across the entire cluster.

    📘

    We restrict the engine-pool to one cluster zone because you can't use regional persistent disks on VMs that use G2 standard machine types. This guide uses a zonal persistent disk as a workaround, which means we must limit the nodes in engine-pool to a single zone in order to mount the disk.

    gcloud container node-pools create api-pool \
        --cluster $CLUSTER_NAME \
        --location $CLUSTER_LOCATION \
        --num-nodes 1 \
        --enable-autoscaling \
        --max-nodes 3 \
        --machine-type n1-standard-4 \
        --node-labels k8s.deepgram.com/node-type=api
      
    gcloud container node-pools create engine-pool \
        --cluster $CLUSTER_NAME \
        --region $CLUSTER_LOCATION \
        --node-locations $ENGINE_NP_ZONE \
        --num-nodes 1 \
        --enable-autoscaling \
        --max-nodes 8 \
        --machine-type g2-standard-8 \
        --accelerator=type=nvidia-l4,count=1,gpu-driver-version=latest \
        --node-labels k8s.deepgram.com/node-type=engine
      
    gcloud container node-pools create license-proxy-pool \
        --cluster $CLUSTER_NAME \
        --location $CLUSTER_LOCATION \
        --num-nodes 1 \
        --enable-autoscaling \
        --max-nodes 2 \
        --machine-type n1-standard-2 \
        --node-labels k8s.deepgram.com/node-type=license-proxy
    
  3. Create a dedicated namespace for Deepgram resources.

    kubectl create namespace dg-self-hosted
    kubectl config set-context --current --namespace=dg-self-hosted  
    

Configure Persistent Storage

  1. Create a Google Persistent Disk to store Deepgram model files and share them across multiple Deepgram Engine pods.

    DISK_NAME=deepgram-model-storage
    DISK_URI=$(
        gcloud compute disks create \
            $DISK_NAME \
            --size=40GB \
            --type=pd-ssd \
            --zone $ENGINE_NP_ZONE \
            --format="value(selfLink)" | \
        sed -e 's#.*/projects/#projects/#'
    )
    
  2. Create a temporary VM instance in one of your cluster's zones and attach the persistent disk.

    gcloud compute instances create model-downloader \
        --machine-type=n1-standard-1 \
        --zone $ENGINE_NP_ZONE \
        --disk=name=$DISK_URI,scope=zonal,device-name=$DISK_NAME,mode=rw,boot=no
    
  3. SSH into the VM instance.

    gcloud compute ssh model-downloader \
        --zone $ENGINE_NP_ZONE
    
  4. In the VM, format and mount the disk, then download the model files provided by your Deepgram Account Representative onto the disk.

    DISK_NAME=deepgram-model-storage
    sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard \
        /dev/disk/by-id/google-$DISK_NAME
    
    MOUNT_PATH=/mnt/disks/models
    sudo mkdir -p $MOUNT_PATH
    sudo mount -o discard,defaults /dev/disk/by-id/google-$DISK_NAME $MOUNT_PATH
    sudo chmod a+w $MOUNT_PATH
    cd $MOUNT_PATH
    
    # Download each model file 
    wget https://link-to-model-1.dg
    wget https://link-to-model-2.dg  
    # ... continue for all model files
    
  5. Unmount the disk and delete the temporary VM instance.

    cd
    sudo umount $MOUNT_PATH
    exit
    
    gcloud compute instances delete model-downloader \
        --zone $ENGINE_NP_ZONE
    

Configure Kubernetes Secrets

Deepgram strongly recommends following best practices for configuring Kubernetes Secrets . Please refer to Securing Your Cluster for more details.

The deepgram-self-hosted Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram's container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.

  1. Complete the Self Service Licensing & Credentials guide to generate distribution credentials and a self-hosted API key.

  2. If using an external Secret store provider, configure cluster access to these two Secrets, naming them dg-regcred (distribution credentials) and dg-self-hosted-api-key.

  3. If not using an external Secret store provider, create the Secrets manually in your cluster.

    1. Login to Quay with your newly created distribution credentials, then create a Kubernetes Secret named dg-regcred to store your credentials.

      docker login quay.io  
      kubectl create secret generic dg-regcred \
        --from-file=.dockerconfigjson=$HOME/.docker/config.json \
        --type=kubernetes.io/dockerconfigjson
      
    2. Create a Kubernetes Secret named dg-self-hosted-api-key to store your self-hosted API key.

      kubectl create secret generic dg-self-hosted-api-key \
          --from-literal=DEEPGRAM_API_KEY='YOUR_API_KEY_HERE'  
      

      💻

      Replace the placeholder YOUR_API_KEY_HERE with the Deepgram API key you generated in the Self Service Licensing & Credentials guide.

Deploy Deepgram

Deepgram maintains the official deepgram-self-hosted Helm Chart. You can reference the source and Artifact Hub listing for more details. We'll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.

  1. Fetch the repository info.

    helm repo add deepgram https://deepgram.github.io/self-hosted-resources
    helm repo update
    
  2. Download a values.yaml template for Deepgram's self-hosted resources. For example, here is a template for a GCP deployment that includes the Deepgram License Proxy.

  3. Modify the values.yaml file:

    • Update the scaling.static.{api,engine,licenseProxy}.replicas values to match your node pool sizes.

    • Configure the engine.modelManager.volumes.gcp.gpd values to use the Google Persistent Disk you created earlier.

      echo $DISK_URI
      
      engine:
        modelManager:  
          volumes:
            gcp:
              gpd:
                enabled: true
                volumeHandle: "<your DISK_URI here>"
      
  4. Install the Helm Chart with your values.yaml file.

    helm install deepgram deepgram/deepgram-self-hosted \
        -f my-values.yaml \
        --namespace dg-self-hosted \
        --atomic \
        --timeout 20m 
    # Monitor the installation in a separate shell
    watch kubectl get all
    

    🚧

    Resource Limits

    It may take some time for GKE to resize the number of nodes in your cluster to accommodate your deployment.

    If you want to monitor the status, or your pods aren't being scheduled as you expect, you can see a pod's scheduling status with kubectl describe pod <pod-name>, which may contain details on what is preventing scheduling.

Test Your Deepgram Setup with a Sample Request

Test your environment and container setup with a local file.

  1. Get the name of one of the Deepgram API Pods.
    API_POD_NAME=$(
        kubectl get pods \
            --selector app=deepgram-api \
            --output jsonpath='{.items[0].metadata.name}' \
            --no-headers
    )
    
  2. Launch an ephemeral container to send your test request from.
    kubectl debug $API_POD_NAME \
        -it \
        --image=curlimages/curl \
        -- /bin/sh
    
  3. Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
    wget https://dpgr.am/bueller.wav
    
  4. Send your audio file to your local Deepgram setup for transcription.
    curl \
        -X POST \
        --data-binary @bueller.wav \
        "http://localhost:8080/v1/listen?model=nova-2&smart_format=true"
    

    📘

    If needed, adjust pieces of the above command:

    • the query parameters to match the directions from your Deepgram Account Representative
    • the service name (deepgram-api-external)
    • the namespace (dg-self-hosted)

You should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!

Next Steps

Your Deepgram services are accessible within your cluster via the deepgram-api-external Service that was created by the Helm Chart. You may consider configuring additional ingress with a GCP Load Balancer to access your services.


What’s Next

Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.