Nova-3 Model Update
🌏 Nova-3 now supports Gujarati with the following language codes:
- Gujarati:
gu,gu-IN
Access this model by setting model="nova-3" and the relevant language code in your request.
Learn more about Nova-3 and supported languages on the Models and Language Overview page.
April 16, 2026
Deepgram Self-Hosted April 2026 Release (260416)
Container Images (release 260416)
-
quay.io/deepgram/self-hosted-api:release-260416- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.181.7
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260416-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.115.1
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260416- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260416- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
This Release Contains The Following Changes
- Flux Multilingual — Real-time multilingual conversational STT is now available for self-hosted deployments. Supports 10 languages (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) with code-switching support. Deploying Flux Multilingual requires setting
model_name = "flux-general-multi"in the[flux]section ofengine.toml. See Flux Multilingual for details. - General Improvements — Keeps our software up-to-date.
Deepgram CLI Is Now Available
The Deepgram CLI brings transcription, speech synthesis, text analysis, account management, and MCP tooling to your terminal through a single dg command.
What you can do
Use the dg CLI to work with Deepgram from your terminal:
- Transcribe files, URLs, microphone input, and piped audio
- Generate speech with Deepgram Aura voices
- Run text analysis workflows such as summarization, sentiment, and topic detection
- Manage projects, API keys, members, and usage
- Start an MCP server for AI coding tools
Launch docs
Start here:
For the launch site and quick reference, visit cli.deepgram.com.
Flux Multilingual: Conversational STT, Now in 10 Languages
The same model that solved turn detection for English voice agents now works everywhere your customers speak — no language routing, no model-per-language infrastructure, no accuracy tradeoff.
Deepgram is proud to announce the general availability of Flux Multilingual (flux-general-multi), a single model supporting 10 languages with the same turn-aware, interruption-aware conversational intelligence as flux-general-en.
Key Features:
- 10 languages, one model — English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. No language routing or model-per-language infrastructure required.
- Language prompting — The optional
language_hintparameter biases the model toward specific languages, delivering accuracy on par with dedicated monolingual models. Without hints, the model auto-detects the spoken language. - Native code-switching — Handles mid-sentence language switches without configuration changes or reconnections.
- Language detection on every turn — All
TurnInfoevents include alanguagesfield reporting detected languages sorted by word count, and alanguages_hintedfield reflecting the active hints. - Mid-stream reconfiguration — Update language hints during a stream using the Configure control message without disconnecting. Supports patterns like detect-then-lock for optimal accuracy.
- Same Flux architecture — All turn detection, eager end-of-turn, and configuration parameters from
flux-general-enwork identically.
Use Cases:
Designed for non-English monolingual voice agents, multilingual voice agents, global contact centers, bilingual support lines, and any real-time conversational application where callers may speak different languages. For English-only workloads, continue using flux-general-en.
Getting Started:
Connect to Flux Multilingual by setting model=flux-general-multi on the /v2/listen endpoint — no new credentials or endpoints required. Pricing is the same as flux-general-en.
Learn more in the Language Prompting guide, Flux Quickstart, and API Reference.
Availability
Flux Multilingual is now available through our API. To access:
- Connect to
wss://api.deepgram.com/v2/listenusingmodel=flux-general-multi - EU endpoint available:
wss://api.eu.deepgram.com/v2/listen?model=flux-general-multi - Real-time streaming only
- SDK and self-hosted support coming soon
Reusable agent configurations
You can now store and manage agent configurations and template variables through the Deepgram API. Instead of sending a full agent configuration with every WebSocket session, define it once and reference it by UUID.
Key use cases include:
- Per-customer configurations — Give each customer a distinct voice, persona, or model without maintaining separate codebases.
- Regional and regulatory compliance — Maintain separate configurations for different markets to enforce data-handling, language, or disclosure requirements.
- A/B testing voices or prompts — Run two configurations in parallel and measure conversion, CSAT, or containment rate without a code deploy.
- Multi-agent architectures — Store and manage all agents used in your multi-agent architecture from a single project.
Template variables let you define reusable values (such as system prompts or model names) that are automatically interpolated at runtime. Variables follow the DG_<VARIABLE_NAME> naming format.
For more details, see Reusable Agent Configurations.
NVIDIA LLM provider now available
NVIDIA is now a supported LLM provider for the Voice Agent API. Two models are available in the Standard pricing tier:
llama-nemotron-super-49B— Llama Nemotron Super 49B delivers high accuracy for multi-agentic reasoning.nemotron-3-nano-30B-A3B— Nemotron 3 Nano 30B A3B provides cost efficiency with high accuracy for targeted agentic tasks.
Set the provider type to nvidia in your agent configuration:
NVIDIA is a managed provider, so the endpoint field is optional. For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
April 2, 2026
Deepgram Self-Hosted April 2026 Release (260402)
Container Images (release 260402)
-
quay.io/deepgram/self-hosted-api:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.181.3
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260402-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.114.5
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
This Release Contains The Following Changes
- Certificate Endpoint Fix — Engine now responds to
/v1/certificatesin addition to/certificates, consistent with the other container images. See Certificate Status for details. - Model Name Consistency — The
/v1/modelsendpoint now returns acanonical_namefield matching the model name used in/v1/listenrequests. - General Improvements — Keeps our software up-to-date.
New thought_signature field for Gemini function calling
The Voice Agent API now includes an optional thought_signature field in function call messages. Some Gemini models (3.0 and 3.1 families) require this as an additional function call identifier.
This field appears in two places:
- Settings message — in
agent.context.messages[].function_calls[]when providing function call history - FunctionCallRequest — in
functions[]when the server requests a function call
Example
The thought_signature field is optional and only relevant when using Google Gemini models. This change addresses the degraded function calling performance that some users experienced with the Gemini 3.0 and 3.1 model families.
For more details, see the Function Call Request documentation, the Voice Agent API Reference, or Gemini’s Thought Signatures Documentation.
New volume parameter for Cartesia TTS
The Voice Agent API now supports an optional agent.speak.provider.volume parameter when using Cartesia as the TTS provider. Valid values range from 0.5 to 2.0.
For more details, see Configure the Voice Agent or the Cartesia volume, speed, and emotion documentation.
Nova-3 Model Update
🌏 Nova-3 now supports the following new languages and language codes:
- Chinese (Mandarin, Simplified):
zh,zh-CN,zh-Hans - Chinese (Mandarin, Traditional):
zh-TW,zh-Hant
Access these models by setting model="nova-3" and the relevant language code in your request.
Learn more about Nova-3 and supported languages on the Models and Language Overview page.
TTS speed controls & updated LLM models
TTS speak speed (Early Access)
You can now control the speaking rate of Deepgram TTS in the Voice Agent API using the agent.speak.provider.speed parameter. This parameter accepts a float value between 0.7 and 1.5, with 1.0 as the default.
This feature is in Early Access and is only available for Deepgram TTS. For more details, see TTS voice controls. To request access, contact your Account Executive or reach out to sales@deepgram.com.
Updated LLM models
New OpenAI models — Two new models are now available in the Standard pricing tier:
gpt-5.4-nanogpt-5.4-mini
Gemini 2.0 Flash deprecated — The gemini-2.0-flash model is now deprecated. We recommend migrating to gemini-2.5-flash or a newer Gemini model. See the Google models table for alternatives.
For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.