April 30, 2026
Deepgram Self-Hosted April 2026 Release (260430)
Container Images (release 260430)
-
quay.io/deepgram/self-hosted-api:release-260430- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.185.0-2
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260430-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.116.0-1
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260430- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260430- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
Aura-2 Speed and Pronunciation Controls require an updated voice-pack
The new Aura-2 Speed and Pronunciation Control features in this release are powered by an updated Aura-2 English voice-pack model. If your deployment is using an Aura-2 English voice-pack from before the April 2026 release (e.g., the 2025-04-15.0 version of the voice-pack), requests including the speed or pronounce parameters will return 400 Bad Request.
To enable these features, contact your Deepgram representative to obtain the latest Aura-2 English voice-pack (2025-04-15.4 or later) and replace the existing voice-pack file in your models directory. The official Deepgram Helm chart and sample values files in deepgram/self-hosted-resources (chart 0.34.0 and later) already point to the correct UUID; you only need to use the latest Deepgram configuration files and update the model file on disk.
This Release Contains The Following Changes
- Nova-3 Gujarati — Nova-3 now supports Gujarati (
gu) for both batch and streaming. - Aura-2 Speed and Pronunciation Controls — Aura-2 TTS voices now support runtime speed and pronunciation control. See Voice Controls for details.
- Improved Aura-2 Pronunciation — Better pronunciation for Spanish dates and the term “Jan” (as a name versus a month) with Aura-2 voices.
- Nova-3 Multilingual Numeral Formatting — Numeral formatting is now applied when using Nova-3 multilingual models and
smart_formatornumeralsis enabled. - Numeral Formatting for Hebrew and Romanian — Numeral formatting is now applied for Nova-3 Hebrew (
he) and Romanian (ro) whensmart_formatornumeralsis enabled. - Voice Agent: Cartesia Speed Control — The Cartesia speak provider now supports speed control in Voice Agent sessions.
- Voice Agent: Improved Agent Message Injection — Improved support for injecting agent messages into a live session. See Inject Agent for details.
- Voice Agent: Multilingual Flux Language Hints — Multilingual Flux now accepts language hints when used as the STT provider in a Voice Agent session.
- Improved Multilingual Streaming Language Tags — Improves the accuracy of language tag results on
/v1/listenstreaming requests using multilingual models. - Improved Numeral Redaction Accuracy — Improved redaction accuracy when using
redact=numbersorredact=aggressive_numbers. - General Improvements — Keeps our software up-to-date.
LLM Model Updates & Cartesia Speed Control
GPT-5.5 LLM Model Support
OpenAI’s GPT-5.5 model is now available as a managed LLM in the Voice Agent API. GPT-5.5 is an Advanced tier model.
Set the model in your agent configuration:
For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
Llama Nemotron Super 49B Removed
The llama-nemotron-super-49B (Llama Nemotron Super 49B) model has been removed from the NVIDIA provider due to poor performance. The nemotron-3-nano-30B-A3B model remains available. See the NVIDIA models table for current options.
Cartesia TTS Speed Control
The agent.speak.provider.speed parameter now supports Cartesia TTS in addition to Deepgram TTS. For Cartesia, the parameter accepts the following preset values:
slowestslownormalfastfastest
You can also pass a numerical value for more granular control. See the Cartesia speed documentation for details.
For more details, see the TTS Models documentation.
Nova-3 Model Update
🌏 Nova-3 now supports Gujarati with the following language codes:
- Gujarati:
gu,gu-IN
Access this model by setting model="nova-3" and the relevant language code in your request.
Learn more about Nova-3 and supported languages on the Models and Language Overview page.
April 16, 2026
Deepgram Self-Hosted April 2026 Release (260416)
Container Images (release 260416)
-
quay.io/deepgram/self-hosted-api:release-260416- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.181.7
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260416-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.115.1
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260416- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260416- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
This Release Contains The Following Changes
- Flux Multilingual — Real-time multilingual conversational STT is now available for self-hosted deployments. Supports 10 languages (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) with code-switching support. Deploying Flux Multilingual requires setting
model_name = "flux-general-multi"in the[flux]section ofengine.toml. See Flux Multilingual for details. - General Improvements — Keeps our software up-to-date.
Deepgram CLI Is Now Available
The Deepgram CLI brings transcription, speech synthesis, text analysis, account management, and MCP tooling to your terminal through a single dg command.
What you can do
Use the dg CLI to work with Deepgram from your terminal:
- Transcribe files, URLs, microphone input, and piped audio
- Generate speech with Deepgram Aura voices
- Run text analysis workflows such as summarization, sentiment, and topic detection
- Manage projects, API keys, members, and usage
- Start an MCP server for AI coding tools
Launch docs
Start here:
For the launch site and quick reference, visit cli.deepgram.com.
Flux Multilingual: Conversational STT, Now in 10 Languages
The same model that solved turn detection for English voice agents now works everywhere your customers speak — no language routing, no model-per-language infrastructure, no accuracy tradeoff.
Deepgram is proud to announce the general availability of Flux Multilingual (flux-general-multi), a single model supporting 10 languages with the same turn-aware, interruption-aware conversational intelligence as flux-general-en.
Key Features:
- 10 languages, one model — English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. No language routing or model-per-language infrastructure required.
- Language prompting — The optional
language_hintparameter biases the model toward specific languages, delivering accuracy on par with dedicated monolingual models. Without hints, the model auto-detects the spoken language. - Native code-switching — Handles mid-sentence language switches without configuration changes or reconnections.
- Language detection on every turn — All
TurnInfoevents include alanguagesfield reporting detected languages sorted by word count, and alanguages_hintedfield reflecting the active hints. - Mid-stream reconfiguration — Update language hints during a stream using the Configure control message without disconnecting. Supports patterns like detect-then-lock for optimal accuracy.
- Same Flux architecture — All turn detection, eager end-of-turn, and configuration parameters from
flux-general-enwork identically.
Use Cases:
Designed for non-English monolingual voice agents, multilingual voice agents, global contact centers, bilingual support lines, and any real-time conversational application where callers may speak different languages. For English-only workloads, continue using flux-general-en.
Getting Started:
Connect to Flux Multilingual by setting model=flux-general-multi on the /v2/listen endpoint — no new credentials or endpoints required. Pricing is the same as flux-general-en.
Learn more in the Language Prompting guide, Flux Quickstart, and API Reference.
Availability
Flux Multilingual is now available through our API. To access:
- Connect to
wss://api.deepgram.com/v2/listenusingmodel=flux-general-multi - EU endpoint available:
wss://api.eu.deepgram.com/v2/listen?model=flux-general-multi - Real-time streaming only
- SDK and self-hosted support coming soon
Reusable agent configurations
You can now store and manage agent configurations and template variables through the Deepgram API. Instead of sending a full agent configuration with every WebSocket session, define it once and reference it by UUID.
Key use cases include:
- Per-customer configurations — Give each customer a distinct voice, persona, or model without maintaining separate codebases.
- Regional and regulatory compliance — Maintain separate configurations for different markets to enforce data-handling, language, or disclosure requirements.
- A/B testing voices or prompts — Run two configurations in parallel and measure conversion, CSAT, or containment rate without a code deploy.
- Multi-agent architectures — Store and manage all agents used in your multi-agent architecture from a single project.
Template variables let you define reusable values (such as system prompts or model names) that are automatically interpolated at runtime. Variables follow the DG_<VARIABLE_NAME> naming format.
For more details, see Reusable Agent Configurations.
NVIDIA LLM provider now available
NVIDIA is now a supported LLM provider for the Voice Agent API. The following model is available in the Standard pricing tier:
nemotron-3-nano-30B-A3B— Nemotron 3 Nano 30B A3B provides cost efficiency with high accuracy for targeted agentic tasks.
Set the provider type to nvidia in your agent configuration:
NVIDIA is a managed provider, so the endpoint field is optional. For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
April 2, 2026
Deepgram Self-Hosted April 2026 Release (260402)
Container Images (release 260402)
-
quay.io/deepgram/self-hosted-api:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.181.3
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260402-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.114.5
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
This Release Contains The Following Changes
- Certificate Endpoint Fix — Engine now responds to
/v1/certificatesin addition to/certificates, consistent with the other container images. See Certificate Status for details. - Model Name Consistency — The
/v1/modelsendpoint now returns acanonical_namefield matching the model name used in/v1/listenrequests. - General Improvements — Keeps our software up-to-date.
New thought_signature field for Gemini function calling
The Voice Agent API now includes an optional thought_signature field in function call messages. Some Gemini models (3.0 and 3.1 families) require this as an additional function call identifier.
This field appears in two places:
- Settings message — in
agent.context.messages[].function_calls[]when providing function call history - FunctionCallRequest — in
functions[]when the server requests a function call
Example
The thought_signature field is optional and only relevant when using Google Gemini models. This change addresses the degraded function calling performance that some users experienced with the Gemini 3.0 and 3.1 model families.
For more details, see the Function Call Request documentation, the Voice Agent API Reference, or Gemini’s Thought Signatures Documentation.
New volume parameter for Cartesia TTS
The Voice Agent API now supports an optional agent.speak.provider.volume parameter when using Cartesia as the TTS provider. Valid values range from 0.5 to 2.0.
For more details, see Configure the Voice Agent or the Cartesia volume, speed, and emotion documentation.