Troubleshooting

If you encounter any challenges while deploying or maintaining your Deepgram self-hosted services on Kubernetes, please consult this guide.

Kubernetes Troubleshooting

Use this checklist to validate the health of your Kubernetes cluster running Deepgram self-hosted services.

Deepgram Helm Chart

  • Check the version of the Helm chart for Deepgram that youโ€™re currently running on your Kubernetes cluster with helm list.
  • If the Deepgram engine Pod is not starting up succesfully, you may need to increase the startupProbe values
    • Modify the periodSeconds and failureThreshold values based on your tolerances

Storage

  • Ensure you donโ€™t have too many files in your cloud filesystem for Deepgram model file storage (.dg files)
    • โš ๏ธ Having too many model files can slow down the engine pod startup significantly! We recommend testing your self-hosted setup with a handful of model files, and progressively add more if needed, to ensure that the engine is able to start up in a timely manner.
  • Check the metrics on your cloud storage volume (eg. Amazon EFS) to see if there is heavy utilization
    • This is not necessarily an issue in an of itself, but rather a symptom indicating that something else needs attention

Infrastructure

  • If the engine Pod is not starting up successfully, check the Pod logs for any errors.
    • Command: kubectl logs pod <pod-name>
  • Look at the Pod events and see if Kubernetes (kubelet) is killing the Pod before it has a chance to startup fully
    • Command: kubectl describe pod <pod-name>
  • Ensure that the engine Pod is scheduled on a compatible GPU node.
    • Command: kubectl describe pod <pod-name>
  • Run a test API call against the Deepgram API server
    • Command: kubectl run --namespace=dg-self-hosted --rm --stdin --tty --image=mcr.microsoft.com/dotnet/sdk --command test-client -- pwsh -Command Invoke-RestMethod -Uri 'http://deepgram-api-external.dg-self-hosted.svc.cluster.local:8080/models'

AI Troubleshooting

To simplify the process of identifying root causes and resolutions, for Deepgram services running on Kubernetes, you can use an AI client with Model Context Protocol (MCP) support.

Prerequisites

General Prompt Guidelines

  • Keep in mind that each AI client application has its own built-in prompt templates, and may behave differently.
  • We generally advise against enabling auto-approval for MCP tools that apply changes.
  • Enabling auto-approval for MCP tools that perform read-only operations is generally safe.
  • Use safe-guard statements in your prompts, such as โ€œdo not make any changesโ€ in case your AI application is prone to attempt โ€œfixingโ€ things.

Example Troubleshooting Prompts

Once youโ€™ve finished setting up your AI MCP client, you can use the following prompts to help identify issues in your Kubernetes cluster. Feel free to adapt any these prompts to your specific environment, such as updating the Kubernetes namespace youโ€™ve deployed to, or provide additional, relevant context.

Check if youโ€™re running the latest version of the Deepgram Helm chart:

Donโ€™t make any changes to the kubernetes cluster. Check if I am running the latest version of the Deepgram self-hosted Helm chart. Use the currently selected context.

Make sure all the essential Deepgram pods exist:

Does my dg-self-hosted namespace have at least one API server, one engine, and one license proxy pod running?

Make sure essential Deepgram pods are running:

Are there any pods in the dg-self-hosted k8s namespace that are not running properly?

Check to see if the โ€œengineโ€ pod is being killed by the kubelet startup probe.

Get the pod details for the engine pod in the dg-self-hosted k8s namespace. Check to see if its startup probe is failing repeatedly.

Read logs from the Deepgram engine pod:

Check the logs for the Deepgram engine pod in the dg-self-hosted k8s namespace and see if there are any notable warnings or errors.

Ensure the Deepgram engine pod is scheduled on a system with at least a single NVIDIA GPU:

Make sure that the Deepgram engine pod on the kubernetes cluster is not scheduled on a node that has a fractional GPU. Do not make any changes to the cluster. Use the currently selected k8s context.