June 11, 2026

Deepgram Self-Hosted June 2026 Release (260611)

Container Images (release 260611)

quay.io/deepgram/self-hosted-api:release-260611
- Equivalent image to:
  - quay.io/deepgram/self-hosted-api:1.191.0-1
quay.io/deepgram/self-hosted-engine:release-260611
- Equivalent image to:
  - quay.io/deepgram/self-hosted-engine:3.118.0-1
- Minimum required NVIDIA driver version: >=570.172.08
quay.io/deepgram/self-hosted-license-proxy:release-260611
- Equivalent image to:
  - quay.io/deepgram/self-hosted-license-proxy:1.10.1-1
quay.io/deepgram/self-hosted-billing:release-260611
- Equivalent image to:
  - quay.io/deepgram/self-hosted-billing:1.13.0

Action Required: Engine Container GPU Environment Variables

The Engine container change previewed in the May 28, 2026 release has shipped in this release. The Engine container now requires two environment variables to access the GPU:

NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility

If they are not set, the Engine container will fail to start after you upgrade to release-260611. Follow the step for your deployment method before pulling the release-260611 Engine image:

Official Helm chart (deepgram-self-hosted): Upgrade to chart version 0.37.0 or later. The chart sets both variables on the Engine pod automatically whenever a GPU is requested, so no manual change is needed. If you pin an older chart version, bump it as part of adopting this release.
Deepgram-provided Docker or Podman Compose files: Pull the latest files from deepgram/self-hosted-resources. They already set both variables on the Engine service.

Your own deployment manifests: Add both variables to the Engine container’s environment. For example, in a Docker or Podman Compose file:

1 services:
2   engine:
3     image: quay.io/deepgram/self-hosted-engine:release-260611
4     runtime: nvidia
5     environment:
6       NVIDIA_VISIBLE_DEVICES: "all"
7       NVIDIA_DRIVER_CAPABILITIES: "compute,utility"

Or, for a Kubernetes Engine container spec:

1 env:
2   - name: NVIDIA_VISIBLE_DEVICES
3     value: "all"
4   - name: NVIDIA_DRIVER_CAPABILITIES
5     value: "compute,utility"

After setting the variables, upgrade your image tags to release-260611 and restart the Engine. Confirm it reaches a healthy state (for example, GET /v1/status returns 200) before routing production traffic.

This Release Contains The Following Changes

Persian Profanity Filtering — profanity_filter=true now masks recognized profanity in Persian (fa) transcripts. See Profanity Filtering for the supported language list and usage.
English Redaction on Flux Streaming — redact now applies to English transcripts on the Flux streaming endpoint (/v2/listen).
Streaming Diarization Model Selection — a new diarize_model parameter selects the diarization model on streaming requests; accepted values are v1 and latest, and setting it enables diarization (no separate diarize=true required). See Diarization for details.
Number Formatting Improvements — number formatting now covers Simplified Mandarin (zh), Cantonese (zh-HK), and Bulgarian (bg). Across languages, ordinals written as numerals format more consistently, and indefinite articles (“a”/“an”) format as digits in quantity contexts.
Text-to-Speech Output Transcoding — /v1/speak now supports optional output transcoding to additional audio formats.
Aura-2 Numeric Pronunciation Fix — corrects a pronunciation issue on all-numeric inputs.
Voice Agent Third-Party Provider Reliability — improves reliability of ElevenLabs streaming and Cartesia cancellation handling for self-hosted Voice Agent.
General Improvements — Keeps our software up-to-date.