Troubleshooting
If you encounter any challenges while deploying or maintaining your Deepgram self-hosted services on Kubernetes, please consult this guide.
If you encounter any challenges while deploying or maintaining your Deepgram self-hosted services on Kubernetes, please consult this guide.
Use this checklist to validate the health of your Kubernetes cluster running Deepgram self-hosted services.
helm list.
engine Pod is not starting up succesfully, you may need to increase the startupProbe values
periodSeconds and failureThreshold values based on your tolerances.dg files)
engine pod startup significantly! We recommend testing your self-hosted setup with a handful of model files, and progressively add more if needed, to ensure that the engine is able to start up in a timely manner.engine Pod is not starting up successfully, check the Pod logs for any errors.
kubectl logs pod <pod-name>kubectl describe pod <pod-name>engine Pod is scheduled on a compatible GPU node.
kubectl describe pod <pod-name>kubectl run --namespace=dg-self-hosted --rm --stdin --tty --image=mcr.microsoft.com/dotnet/sdk --command test-client -- pwsh -Command Invoke-RestMethod -Uri 'http://deepgram-api-external.dg-self-hosted.svc.cluster.local:8080/models'To simplify the process of identifying root causes and resolutions, for Deepgram services running on Kubernetes, you can use an AI client with Model Context Protocol (MCP) support.
kubectl CLI installed locallykubeconfig.yml file
kubectl config get-contexts; kubectl config use-context <NAME>Once you’ve finished setting up your AI MCP client, you can use the following prompts to help identify issues in your Kubernetes cluster. Feel free to adapt any these prompts to your specific environment, such as updating the Kubernetes namespace you’ve deployed to, or provide additional, relevant context.
Check if you’re running the latest version of the Deepgram Helm chart:
Don’t make any changes to the kubernetes cluster. Check if I am running the latest version of the Deepgram self-hosted Helm chart. Use the currently selected context.
Make sure all the essential Deepgram pods exist:
Does my dg-self-hosted namespace have at least one API server, one engine, and one license proxy pod running?
Make sure essential Deepgram pods are running:
Are there any pods in the dg-self-hosted k8s namespace that are not running properly?
Check to see if the “engine” pod is being killed by the kubelet startup probe.
Get the pod details for the engine pod in the dg-self-hosted k8s namespace. Check to see if its startup probe is failing repeatedly.
Read logs from the Deepgram engine pod:
Check the logs for the Deepgram engine pod in the dg-self-hosted k8s namespace and see if there are any notable warnings or errors.
Ensure the Deepgram engine pod is scheduled on a system with at least a single NVIDIA GPU:
Make sure that the Deepgram engine pod on the kubernetes cluster is not scheduled on a node that has a fractional GPU. Do not make any changes to the cluster. Use the currently selected k8s context.