Deploying Deepgram on Google Cloud Platform (GCP) requires some preparation. In this section, you will learn how to provision a managed Kubernetes Cluster where you will deploy Deepgram products. You will need to perform some of these steps in the Google Cloud Console and some in your local terminal.
Make sure you have completed the requirements in the Self-Hosted Introduction.
GPU availability has been extremely limited across cloud providers, including GCP. You may need to request a GPU quota if you are not able to provision spot GPU instances in your node pools.
kubectlThe Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can use kubectl to deploy applications, inspect and manage cluster resources, and view logs.
Install locally using the official Kubernetes guides .
gcloud CLIThe gcloud CLI provides programmatic access to manage your GCP services. Certain steps in this guide are enabled by this tool, although many of the same actions can be performed manually in the Google Cloud Console.
gcloud init to configure the CLI with access to your GCP account and project.us-west1 region.If you would like to deploy to a different region, make sure to adjust templates and steps in this guide accordingly.
gcloud container clustersThe gcloud container clusters command group allows you to create and manage GKE clusters.
Certain steps in this guide use these commands, although many of the same actions can be performed manually in the Google Cloud Console.
helmHelm is the package manager for Kubernetes. A package in Kubernetes is defined by a Helm Chart, which helps you define, install, and upgrade even the most complex Kubernetes application.
We use Helm to install several components in this guide. See the installation guide for details on how to install locally.
Google Kubernetes Engine (GKE) is a managed Kubernetes service to run Kubernetes in GCP. In the cloud, GKE automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks.
Create a new GKE cluster with gcloud, and get the zones where your cluster is created.
Create separate node pools for each Deepgram component (API, Engine, License Proxy). Adjust the machine types and node counts according to your needs. You may wish to consult your Deepgram Account Representative in planning your cluster’s capacity.
num-nodes Default Behaviornum-nodes configures the number of nodes in the node pool in each of the cluster’s zones. If your cluster is configured in 3 zones, settingnum-nodes to 1 will result in 1 node per zone, or 3 nodes across the entire cluster.We restrict the engine-pool to one cluster zone because you can’t use regional persistent disks on VMs that use G2 standard machine types. This guide uses a zonal persistent disk as a workaround, which means we must limit the nodes in engine-pool to a single zone in order to mount the disk.
Create a dedicated namespace for Deepgram resources.
When deploying workloads in a non-default namespace (such as dg-self-hosted), GKE does not automatically provision quotas for the system-node-critical and system-cluster-critical priority classes in that namespace. GPU driver DaemonSets (and other node-critical system workloads) rely on these priorities to schedule correctly.
Create a ResourceQuota in your Deepgram namespace to enable these priorities:
Next, create a Google Cloud Persistent Disk to hold the Deepgram model files. Populate the disk from inside the cluster with a one-shot Kubernetes Job, then delete the Job. The disk remains, and will later be mounted read-only into the Deepgram Engine pods when the Helm chart is installed.
The deepgram-self-hosted Helm chart mounts the Persistent Disk through a ReadOnlyMany PV/PVC, so it cannot be used to write the initial model files. Bind a separate, temporary ReadWriteOnce PV/PVC to the same underlying disk for the download, then delete it before installing the chart.
Create a Google Persistent Disk to store Deepgram model files and share them across multiple Deepgram Engine pods.
Create a temporary writable PersistentVolume and PersistentVolumeClaim that point at the disk you just provisioned. The PV uses ReadWriteOnce and Retain so the underlying disk is preserved when you delete the PV later. The nodeAffinity block keeps the downloader pod in the same zone as the zonal Persistent Disk so the disk can attach.
Run a one-shot Job that mounts the writable PVC and downloads the model files provided by your Deepgram Account Representative. The CSI driver creates the ext4 filesystem on the disk the first time the volume is attached.
Replace the wget lines below with the model URLs supplied by your Deepgram Account Representative.
Wait for the Job to complete successfully.
Delete the Job and the temporary writable PV and PVC. Because the PV uses persistentVolumeReclaimPolicy: Retain, the underlying Google Persistent Disk is preserved with the model files intact.
Deepgram strongly recommends following best practices for configuring Kubernetes Secrets . Please refer to Securing Your Cluster for more details.
The deepgram-self-hosted Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram’s container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.
Complete the Self Service Licensing & Credentials guide to generate distribution credentials and a self-hosted API key.
If using an external Secret store provider, configure cluster access to these two Secrets, naming them dg-regcred (distribution credentials) and dg-self-hosted-api-key.
If not using an external Secret store provider, create the Secrets manually in your cluster.
Using the distribution credentials username and password generated in the Deepgram Console, create a Kubernetes Secret named dg-regcred.
Replace the placeholders QUAY_DG_USER and QUAY_DG_PASSWORD with the distribution credentials you generated in the Self Service Licensing & Credentials guide.
Create a Kubernetes Secret named dg-self-hosted-api-key to store your self-hosted API key.
Replace the placeholder YOUR_API_KEY_HERE with the Deepgram API key you generated in the Self Service Licensing & Credentials guide.
Deepgram maintains the official deepgram-self-hosted Helm Chart. You can reference the source and Artifact Hub listing for more details. We’ll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.
Download a values.yaml template for Deepgram’s self-hosted Helm chart from here.
Modify the values.yaml file:
Update the scaling.static.{api,engine,licenseProxy}.replicas values to match your node pool sizes.
Configure the engine.modelManager.volumes.gcp.gpd values to use the Google Persistent Disk you created earlier.
Install the Helm Chart with your values.yaml file.
If you want to monitor the status, or your pods aren’t being scheduled as you expect, you can see a pod’s scheduling status with kubectl describe pod <pod-name>, which may contain details on what is preventing scheduling.
Test your environment and container setup with a local file.
Get the name of one of the Deepgram API Pods.
Launch an ephemeral container to send your test request from.
Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
Send your audio file to your local Deepgram setup for transcription.
If needed, adjust pieces of the above command:
deepgram-api-externaldg-self-hostedYou should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!
Your Deepgram services are accessible within your cluster via the deepgram-api-external Service that was created by the Helm Chart.
You may consider configuring additional ingress with a GCP Load Balancer to access your services. Note that your installation will automatically load balance any received requests within the cluster to distribute load evenly. The load balancer would primarily serve as the ingress endpoint into the cluster.
What’s Next
Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.