Amazon Web Services
With Kubernetes
Deploying Deepgram on Amazon Web Services (AWS) requires some preparation. In this section, you will learn how to provision a managed Kubernetes Cluster where you will deploy Deepgram products. You will need to perform some of these steps in the AWS Management Console and some in your local terminal.
Prerequisites
Make sure you have completed the requirements in the Self-Hosted Introduction.
kubectl
kubectl
The Kubernetes command-line tool, kubectl
, allows you to run commands against Kubernetes clusters. You can use kubectl
to deploy applications, inspect and manage cluster resources, and view logs.
Install locally using the official Kubernetes guides .
AWS CLI
The AWS CLI provides programmatic access to manage your AWS services. Certain steps in this guide are enabled by this tool, although many of the same actions can be performed manually in the AWS Console.
-
Follow the installation guide to install the CLI locally.
-
Once installed, follow the setup guide to configure the CLI with access to your AWS account. When configuring, set the default region to
us-west-2
.Choosing a Region
The templates and steps in this guide provision resources in the AWS
us-west-2
region.If you would like to deploy to a different region, make sure to specify your desired region when running
aws configure
, and adjust templates and steps in this guide accordingly.
Cluster Management with eksctl
eksctl
eksctl
is the official CLI for Amazon EKS. It simplifies creating and managing clusters by creating subnets, managed node groups, service accounts, and other resources to integrate with your cluster.
Certain steps in this guide are enabled by this tool, although many of the same actions can be performed manually in the AWS Console. See the installation guide for details on how to install the latest version locally.
Make sure to install the latest version of
eksctl
. Do not use the version available through your package manager (e.g.apt
,dnf
), which may be an older release that is missing features used in this guide.Version
>=0.192.0
is required to create EKS clusters with nodes using EKS accelerated AMIs.
Kubernetes Packages with helm
helm
Helm is the package manager for Kubernetes. A package in Kubernetes is defined by a Helm Chart, which helps you define, install, and upgrade even the most complex Kubernetes application.
We use Helm to install several components in this guide. See the installation guide for details on how to install locally.
Creating a Cluster
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service to run Kubernetes in the AWS cloud. In the cloud, Amazon EKS automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks.
-
Download a
ClusterConfig
template from Deepgram's self-hosted resources. For example, here is a template for a basic setup on AWS.- Modify the cluster name and region according to your needs.
- Modify each managed node group's
desiredCapacity
according to your needs. You may wish to consult your Deepgram Account Representative in planning your cluster's capacity.
-
Create a new Kubernetes cluster in Amazon EKS using the
ClusterConfig
manifest. Thiseksctl
command will create several AWS CloudFormation Stacks, which manage the inter-connected creation of a cluster, dedicated VPC, dedicated IAM, node groups, and other necessary resources.eksctl create cluster -f PATH_TO_CLUSTER_CONFIG_YAML
Make sure to replace the
PATH_TO_CLUSTER_CONFIG_YAML
placeholder with the path to the template file you downloaded on your local machine. -
Record metadata from your new cluster in shell variables for use in future steps.
CLUSTER_NAME="deepgram-self-hosted-cluster" # Replace name if modified in your ClusterConfig under `metadata.name` CLUSTER_REGION="us-west-2" # Replace name if modified in your ClusterConfig under `metadata.region` EFS_CSI_DRIVER_ROLE_NAME="efs-csi-driver-role" # Replace name if modified in your ClusterConfig under `iam.serviceAccounts` CAS_SVC_ACCT_NAME="cluster-autoscaler-sa" # Replace name if modified in your ClusterConfig under `iam.serviceAccounts` CAS_ROLE_NAME="cluster-autoscaler-role" # Replace name if modified in your ClusterConfig under `iam.serviceAccounts` CLUSTER_VPC_ID=$( aws eks describe-cluster \ --name $CLUSTER_NAME \ --query 'cluster.resourcesVpcConfig.vpcId' \ --output text )
-
Create an Amazon Elastic File System (EFS) to store Deepgram model files and share them across multiple Deepgram Engine pods.
FS_ID=$( aws efs create-file-system \ --encrypted \ --creation-token DeepgramEKSSetup \ --tags Key=Name,Value=deepgram-cluster-resources \ Key=associated-cluster-name,Value=$CLUSTER_NAME \ --region $CLUSTER_REGION \ --query 'FileSystemId' \ --output text )
-
Install the Amazon EFS CSI driver to allow nodes within your cluster to access the EFS you created. Use the service account role we created via our
ClusterConfig
file, and wait until installation is complete.csi_svc_acct_role_arn=$( aws iam get-role \ --role-name $EFS_CSI_DRIVER_ROLE_NAME \ --query 'Role.Arn' \ --output text ) eksctl create addon \ --cluster $CLUSTER_NAME \ --name aws-efs-csi-driver \ --version latest \ --service-account-role-arn $csi_svc_acct_role_arn \ --force aws eks wait addon-active \ --cluster-name $CLUSTER_NAME \ --addon-name aws-efs-csi-driver
-
eksctl
automatically creates several security groups when it provisions your cluster. One of these security groups facilitates communication between AWS-managed nodes and other AWS resources. Find this security group and record its ID for the next step.ng_name=$( aws eks list-nodegroups \ --cluster-name $CLUSTER_NAME \ --query 'nodegroups[0]' \ --output text ) lt_id=$( aws eks describe-nodegroup \ --cluster-name $CLUSTER_NAME \ --nodegroup-name $ng_name \ --query 'nodegroup.launchTemplate.id' \ --output text ) sg_ids=$( aws ec2 describe-launch-template-versions \ --launch-template-id $lt_id \ --query 'LaunchTemplateVersions[0].LaunchTemplateData.SecurityGroupIds[*]' \ --output text ) while read -r sg_id; do sg_ingress_rule=$( aws ec2 describe-security-groups \ --group-ids $sg_id \ --query 'SecurityGroups[0].IpPermissions[?contains(UserIdGroupPairs[*].GroupId, `'$sg_id'`)].IpProtocol' \ --output text ) if [[ $sg_ingress_rule == "-1" ]]; then MODEL_ACCESS_SG_ID=$sg_id break fi done <<< "$sg_ids"
-
Create mount targets on the EFS with the proper security group. This will allow all Deepgram Engine pods shared access to the EFS to read the model files that will be stored there.
subnet_ids=$( aws eks describe-cluster \ --name $CLUSTER_NAME \ --query "cluster.resourcesVpcConfig.subnetIds" | \ jq -r '.[]' ) while read -r subnet_id; do aws efs create-mount-target \ --file-system-id $FS_ID \ --subnet-id $subnet_id \ --security-groups $MODEL_ACCESS_SG_ID \ --no-cli-pager done <<< "$subnet_ids"
-
Record the Role ARN that will be used later to Install the Kubernetes Autoscaler, a component that automatically adjusts the size of a Kubernetes Cluster so that all pods have a place to run and there are no unneeded nodes.
CAS_SVC_ACCT_ROLE_ARN=$( aws iam get-role \ --role-name $CAS_ROLE_NAME \ --query 'Role.Arn' \ --output text )
-
Create a dedicated namespace for Deepgram resources.
kubectl create namespace dg-self-hosted kubectl config set-context --current --namespace=dg-self-hosted
Configure Kubernetes Secrets
Deepgram strongly recommends following best practices for configuring Kubernetes Secrets. Please refer to Securing Your Cluster for more details.
The deepgram-self-hosted
Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram's container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.
-
Complete the Self Service Licensing & Credentials guide to generate distribution credentials and a self-hosted API key.
-
If using an external Secret store provider, configure cluster access to these two Secrets, naming them
dg-regcred
(distribution credentials) anddg-self-hosted-api-key
. -
If not using an external Secret store provider, create the Secrets manually in your cluster.
-
Using the distribution credentials username and password generated in the Deepgram Console, create a Kubernetes Secret named
dg-regcred
.kubectl create secret docker-registry dg-regcred \ --docker-server=quay.io \ --docker-username='QUAY_DG_USER' \ --docker-password='QUAY_DG_PASSWORD'
Replace the placeholders
QUAY_DG_USER
andQUAY_DG_PASSWORD
with the distribution credentials you generated in the Self Service Licensing & Credentials guide. -
Create a Kubernetes Secret named
dg-self-hosted-api-key
to store your self-hosted API key.kubectl create secret generic dg-self-hosted-api-key \ --from-literal=DEEPGRAM_API_KEY='YOUR_API_KEY_HERE'
Replace the placeholder
YOUR_API_KEY_HERE
with the Deepgram API key you generated in the Self Service Licensing & Credentials guide.
-
Deploy Deepgram
Deepgram maintains the official deepgram-self-hosted
Helm Chart. You can reference the source and Artifact Hub listing for more details. We'll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.
-
helm repo add deepgram https://deepgram.github.io/self-hosted-resources helm repo update
-
Download a
values.yaml
template from Deepgram's self-hosted resources. For example, here is a template for a basic setup on AWS. -
In your
values.yaml
, modify thescaling.replicas.{api,engine}
values to match your set the initial number of replicas when your cluster is created. The capacities were defined previously withdesiredCapacity
in yourcluster-config.yaml
file.If you want to enable pod autoscaling in your cluster, reach out to your Deepgram Account Representative to discuss whether soft or hard limits make sense for your use case, and what values to use for scaling your cluster based on traffic demands.
-
In your
values.yaml
file, insert your Amazon EFS ID into theengine.modelManager.volumes.aws.efs.fileSystemId
value. You can get the ID from the shell variable you created previously.echo $FS_ID
engine: modelManager: volumes: aws: efs: enabled: true fileSystemId: fs-xxxxxxxxxxxxxxxx # Replace with your EFS ID
-
Your Deepgram Account Representative will have provided you with a list of links to models for inference (file extension
.dg
). In yourvalues.yaml
file, insert each of these model links in theengine.modelManager.models.links
list.engine: modelManager: models: links: - https://link-to-model-1.dg # Replace these links with those provided to you - https://link-to-model-2.dg # by your Deepgram Account Representative. - https://link-to-model-3.dg - ...
-
In your
values.yaml
file, insert the AWS Role ARN to be used by the Cluster Autoscaler. If needed, adjust the cluster name and region as well.echo $CAS_SVC_ACCT_NAME echo $CAS_SVC_ACCT_ROLE_ARN echo $CLUSTER_NAME echo $CLUSTER_REGION
cluster-autoscaler: enabled: true rbac: serviceAccount: # Name of the CAS IAM Service Account name: "cluster-autoscaler-sa" annotations: # Replace with the AWS Role ARN configured for the Cluster Autoscaler eks.amazonaws.com/role-arn: "arn:aws:iam::000000000000:role/MyRoleName" autoDiscovery: # Replace if needed clusterName: "deepgram-self-hosted-cluster" # Replace if needed awsRegion: "us-west-2"
-
Install the Helm Chart with your
values.yaml
file.helm install deepgram deepgram/deepgram-self-hosted \ -f my-values.yaml \ --namespace dg-self-hosted \ --atomic \ --timeout 1h # Monitor the installation in a separate shell watch kubectl get all
Pod Scheduling Failures Limits
Resource limits, taints, and other constraints may limit Pod scheduling. If a Pod is not able to be scheduled, you can see its status and a list of associated events with
kubectl describe pod <pod-name>
.
Test Your Deepgram Setup with a Sample Request
Test your environment and container setup with a local file.
- Get the name of one of the Deepgram API Pods.
API_POD_NAME=$( kubectl get pods \ --selector app=deepgram-api \ --output jsonpath='{.items[0].metadata.name}' \ --no-headers )
- Launch an ephemeral container to send your test request from.
kubectl debug $API_POD_NAME \ -it \ --image=curlimages/curl \ -- /bin/sh
- Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
wget https://dpgr.am/bueller.wav
- Send your audio file to your local Deepgram setup for transcription.
curl \ -X POST \ --data-binary @bueller.wav \ "http://deepgram-api-external.dg-self-hosted.svc.cluster.local:8080/v1/listen?model=nova-2&smart_format=true"
If needed, adjust pieces of the above command:
- the query parameters to match the directions from your Deepgram Account Representative
- the service name (
deepgram-api-external
) - the namespace (
dg-self-hosted
)
You should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!
Next Steps
Your Deepgram services are accessible within your cluster via the deepgram-api-external
Service that was created by the Helm Chart.
You may consider configuring additional ingress with an AWS Application Load Balancer to access your services. Note that your installation will automatically load balance any received requests within the cluster to distribute load evenly. The load balancer would primarily serve as the ingress endpoint into the cluster.
Updated about 1 month ago
Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.