Amazon Web Services

With Kubernetes

Deploying Deepgram on Amazon Web Services (AWS) requires some preparation. In this section, you will learn how to provision a managed Kubernetes Cluster where you will deploy Deepgram products. You will need to perform some of these steps in the AWS Management Console and some in your local terminal.

Prerequisites

Make sure you have completed the requirements in the Self-Hosted Introduction.

kubectl

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can use kubectl to deploy applications, inspect and manage cluster resources, and view logs.

Install locally using the official Kubernetes guides .

AWS CLI

The AWS CLI provides programmatic access to manage your AWS services. Certain steps in this guide are enabled by this tool, although many of the same actions can be performed manually in the AWS Console.

  1. Follow the installation guide to install the CLI locally.

  2. Once installed, follow the setup guide to configure the CLI with access to your AWS account. When configuring, set the default region to us-west-2.

    📘

    Choosing a Region

    The templates and steps in this guide provision resources in the AWS us-west-2 region.

    If you would like to deploy to a different region, make sure to specify your desired region when running aws configure, and adjust templates and steps in this guide accordingly.

Cluster Management with eksctl

eksctl is the official CLI for Amazon EKS. It simplifies creating and managing clusters by creating subnets, managed node groups, service accounts, and other resources to integrate with your cluster.

Certain steps in this guide are enabled by this tool, although many of the same actions can be performed manually in the AWS Console. See the installation guide for details on how to install the latest version locally.

🚧

Make sure to install the latest version of eksctl. Do not use the version available through your package manager (e.g. apt, dnf), which may be an older release that is missing features used in this guide.

Version >=0.192.0 is required to create EKS clusters with nodes using EKS accelerated AMIs.

Kubernetes Packages with helm

Helm is the package manager for Kubernetes. A package in Kubernetes is defined by a Helm Chart, which helps you define, install, and upgrade even the most complex Kubernetes application.

We use Helm to install several components in this guide. See the installation guide for details on how to install locally.

Creating a Cluster

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service to run Kubernetes in the AWS cloud. In the cloud, Amazon EKS automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks.

  1. Download a ClusterConfig template from Deepgram's self-hosted resources. For example, here is a template for a basic setup on AWS.

    1. Modify the cluster name and region according to your needs.
    2. Modify each managed node group's desiredCapacity according to your needs. You may wish to consult your Deepgram Account Representative in planning your cluster's capacity.
  2. Create a new Kubernetes cluster in Amazon EKS using the ClusterConfig manifest. This eksctl command will create several AWS CloudFormation Stacks, which manage the inter-connected creation of a cluster, dedicated VPC, dedicated IAM, node groups, and other necessary resources.

    eksctl create cluster -f PATH_TO_CLUSTER_CONFIG_YAML
    

    💻

    Make sure to replace the PATH_TO_CLUSTER_CONFIG_YAML placeholder with the path to the template file you downloaded on your local machine.

  3. Record metadata from your new cluster in shell variables for use in future steps.

    CLUSTER_NAME="deepgram-self-hosted-cluster" # Replace name if modified in your ClusterConfig under `metadata.name`
    CLUSTER_REGION="us-west-2" # Replace name if modified in your ClusterConfig under `metadata.region`
    EFS_CSI_DRIVER_ROLE_NAME="efs-csi-driver-role" # Replace name if modified in your ClusterConfig under `iam.serviceAccounts`
    CAS_SVC_ACCT_NAME="cluster-autoscaler-sa" # Replace name if modified in your ClusterConfig under `iam.serviceAccounts`
    CAS_ROLE_NAME="cluster-autoscaler-role" # Replace name if modified in your ClusterConfig under `iam.serviceAccounts`
    CLUSTER_VPC_ID=$(
       aws eks describe-cluster \
          --name $CLUSTER_NAME \
          --query 'cluster.resourcesVpcConfig.vpcId' \
          --output text
    )
    
  4. Create an Amazon Elastic File System (EFS) to store Deepgram model files and share them across multiple Deepgram Engine pods.

    FS_ID=$(
        aws efs create-file-system \
            --encrypted \
            --creation-token DeepgramEKSSetup \
            --tags Key=Name,Value=deepgram-cluster-resources \
                   Key=associated-cluster-name,Value=$CLUSTER_NAME \
            --region $CLUSTER_REGION \
            --query 'FileSystemId' \
            --output text
    )
    
  5. Install the Amazon EFS CSI driver to allow nodes within your cluster to access the EFS you created. Use the service account role we created via our ClusterConfig file, and wait until installation is complete.

    csi_svc_acct_role_arn=$(
       aws iam get-role \
          --role-name $EFS_CSI_DRIVER_ROLE_NAME \
          --query 'Role.Arn' \
          --output text
    )
    eksctl create addon \
        --cluster $CLUSTER_NAME \
        --name aws-efs-csi-driver \
        --version latest \
        --service-account-role-arn $csi_svc_acct_role_arn \
        --force
     aws eks wait addon-active \
        --cluster-name $CLUSTER_NAME \
        --addon-name aws-efs-csi-driver
    
  6. eksctl automatically creates several security groups when it provisions your cluster. One of these security groups facilitates communication between AWS-managed nodes and other AWS resources. Find this security group and record its ID for the next step.

    ng_name=$(
        aws eks list-nodegroups \
        --cluster-name $CLUSTER_NAME \
        --query 'nodegroups[0]' \
        --output text
    )
    lt_id=$(
        aws eks describe-nodegroup \
            --cluster-name $CLUSTER_NAME \
            --nodegroup-name $ng_name \
            --query 'nodegroup.launchTemplate.id' \
            --output text
    )
    sg_ids=$(
        aws ec2 describe-launch-template-versions \
            --launch-template-id $lt_id \
            --query 'LaunchTemplateVersions[0].LaunchTemplateData.SecurityGroupIds[*]' \
            --output text
    )
    while read -r sg_id; do
        sg_ingress_rule=$(
            aws ec2 describe-security-groups \
                --group-ids $sg_id \
                --query 'SecurityGroups[0].IpPermissions[?contains(UserIdGroupPairs[*].GroupId, `'$sg_id'`)].IpProtocol' \
                --output text
        )
        if [[ $sg_ingress_rule == "-1" ]]; then
            MODEL_ACCESS_SG_ID=$sg_id
            break
        fi
    done <<< "$sg_ids"
    
  7. Create mount targets on the EFS with the proper security group. This will allow all Deepgram Engine pods shared access to the EFS to read the model files that will be stored there.

    subnet_ids=$(
        aws eks describe-cluster \
            --name $CLUSTER_NAME \
            --query "cluster.resourcesVpcConfig.subnetIds" | \
        jq -r '.[]'
    )
    
    while read -r subnet_id; do
        aws efs create-mount-target \
            --file-system-id $FS_ID \
            --subnet-id $subnet_id \
            --security-groups $MODEL_ACCESS_SG_ID \
            --no-cli-pager
    done <<< "$subnet_ids"
    
  8. Record the Role ARN that will be used later to Install the Kubernetes Autoscaler, a component that automatically adjusts the size of a Kubernetes Cluster so that all pods have a place to run and there are no unneeded nodes.

    CAS_SVC_ACCT_ROLE_ARN=$(
        aws iam get-role \
            --role-name $CAS_ROLE_NAME \
            --query 'Role.Arn' \
            --output text
    )
    
  9. Create a dedicated namespace for Deepgram resources.

    kubectl create namespace dg-self-hosted
    kubectl config set-context --current --namespace=dg-self-hosted
    

Configure Kubernetes Secrets

Deepgram strongly recommends following best practices for configuring Kubernetes Secrets. Please refer to Securing Your Cluster for more details.

The deepgram-self-hosted Helm chart takes two Secret references. One is a set of distribution credentials that allow the cluster to pull images from Deepgram's container image repository. The other is your self-hosted API key that licenses each Deepgram container that is created.

  1. Complete the Self Service Licensing & Credentials guide to generate distribution credentials and a self-hosted API key.

  2. If using an external Secret store provider, configure cluster access to these two Secrets, naming them dg-regcred (distribution credentials) and dg-self-hosted-api-key.

  3. If not using an external Secret store provider, create the Secrets manually in your cluster.

    1. Using the distribution credentials username and password generated in the Deepgram Console, create a Kubernetes Secret named dg-regcred.

      kubectl create secret docker-registry dg-regcred \
          --docker-server=quay.io \
          --docker-username='QUAY_DG_USER' \
          --docker-password='QUAY_DG_PASSWORD'
      

      💻

      Replace the placeholders QUAY_DG_USER and QUAY_DG_PASSWORD with the distribution credentials you generated in the Self Service Licensing & Credentials guide.

    2. Create a Kubernetes Secret named dg-self-hosted-api-key to store your self-hosted API key.

      kubectl create secret generic dg-self-hosted-api-key \
          --from-literal=DEEPGRAM_API_KEY='YOUR_API_KEY_HERE'
      

      💻

      Replace the placeholder YOUR_API_KEY_HERE with the Deepgram API key you generated in the Self Service Licensing & Credentials guide.

Deploy Deepgram

Deepgram maintains the official deepgram-self-hosted Helm Chart. You can reference the source and Artifact Hub listing for more details. We'll use this Chart to facilitate deploying Deepgram services in your self-hosted environment.

  1. Fetch the repository info.

    helm repo add deepgram https://deepgram.github.io/self-hosted-resources
    helm repo update
    
  2. Download a values.yaml template from Deepgram's self-hosted resources. For example, here is a template for a basic setup on AWS.

  3. In your values.yaml, modify the scaling.replicas.{api,engine} values to match your set the initial number of replicas when your cluster is created. The capacities were defined previously with desiredCapacity in your cluster-config.yaml file.

    📘

    If you want to enable pod autoscaling in your cluster, reach out to your Deepgram Account Representative to discuss whether soft or hard limits make sense for your use case, and what values to use for scaling your cluster based on traffic demands.

  4. In your values.yaml file, insert your Amazon EFS ID into the engine.modelManager.volumes.aws.efs.fileSystemId value. You can get the ID from the shell variable you created previously.

    echo $FS_ID
    
    engine:
      modelManager:
        volumes:
          aws:
            efs:
              enabled: true
              fileSystemId: fs-xxxxxxxxxxxxxxxx # Replace with your EFS ID
    
  5. Your Deepgram Account Representative will have provided you with a list of links to models for inference (file extension .dg). In your values.yaml file, insert each of these model links in the engine.modelManager.models.links list.

    engine:
      modelManager:
        models:
          links:
            - https://link-to-model-1.dg # Replace these links with those provided to you
            - https://link-to-model-2.dg #   by your Deepgram Account Representative.
            - https://link-to-model-3.dg
            - ...
    
  6. In your values.yaml file, insert the AWS Role ARN to be used by the Cluster Autoscaler. If needed, adjust the cluster name and region as well.

    echo $CAS_SVC_ACCT_NAME
    echo $CAS_SVC_ACCT_ROLE_ARN
    echo $CLUSTER_NAME
    echo $CLUSTER_REGION
    
    cluster-autoscaler:
      enabled: true
      rbac:
        serviceAccount:
          # Name of the CAS IAM Service Account
          name: "cluster-autoscaler-sa"
          annotations:
            # Replace with the AWS Role ARN configured for the Cluster Autoscaler
            eks.amazonaws.com/role-arn: "arn:aws:iam::000000000000:role/MyRoleName"
      autoDiscovery:
        # Replace if needed
        clusterName: "deepgram-self-hosted-cluster"  
      # Replace if needed
      awsRegion: "us-west-2"
    
  7. Install the Helm Chart with your values.yaml file.

    helm install deepgram deepgram/deepgram-self-hosted \
        -f my-values.yaml \
        --namespace dg-self-hosted \
        --atomic \
        --timeout 1h 
    # Monitor the installation in a separate shell
    watch kubectl get all
    

    🚧

    Pod Scheduling Failures Limits

    Resource limits, taints, and other constraints may limit Pod scheduling. If a Pod is not able to be scheduled, you can see its status and a list of associated events with kubectl describe pod <pod-name>.

Test Your Deepgram Setup with a Sample Request

Test your environment and container setup with a local file.

  1. Get the name of one of the Deepgram API Pods.
    API_POD_NAME=$(
        kubectl get pods \
            --selector app=deepgram-api \
            --output jsonpath='{.items[0].metadata.name}' \
            --no-headers
    )
    
  2. Launch an ephemeral container to send your test request from.
    kubectl debug $API_POD_NAME \
        -it \
        --image=curlimages/curl \
        -- /bin/sh
    
  3. Inside the ephemeral container, download a sample file from Deepgram (or supply your own file).
    wget https://dpgr.am/bueller.wav
    
  4. Send your audio file to your local Deepgram setup for transcription.
    curl \
        -X POST \
        --data-binary @bueller.wav \
        "http://deepgram-api-external.dg-self-hosted.svc.cluster.local:8080/v1/listen?model=nova-2&smart_format=true"
    

    📘

    If needed, adjust pieces of the above command:

    • the query parameters to match the directions from your Deepgram Account Representative
    • the service name (deepgram-api-external)
    • the namespace (dg-self-hosted)

You should receive a JSON response with the transcript and associated metadata. Congratulations - your self-hosted setup is working!

Next Steps

Your Deepgram services are accessible within your cluster via the deepgram-api-external Service that was created by the Helm Chart.

You may consider configuring additional ingress with an AWS Application Load Balancer to access your services. Note that your installation will automatically load balance any received requests within the cluster to distribute load evenly. The load balancer would primarily serve as the ingress endpoint into the cluster.


What’s Next

Now that you have a basic Deepgram setup working, take some time to learn about building up to a production-level environment, as well as helpful Deepgram add-on services.