April 23, 2026

Nova-3 Model Update

🌏 Nova-3 now supports Gujarati with the following language codes:

  • Gujarati: gu, gu-IN

Access this model by setting model="nova-3" and the relevant language code in your request.

Learn more about Nova-3 and supported languages on the Models and Language Overview page.


April 16, 2026

April 16, 2026

Deepgram Self-Hosted April 2026 Release (260416)

Container Images (release 260416)

  • quay.io/deepgram/self-hosted-api:release-260416

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-api:1.181.7
  • quay.io/deepgram/self-hosted-engine:release-260416

    • Equivalent image to:

      • quay.io/deepgram/self-hosted-engine:3.115.1
    • Minimum required NVIDIA driver version: >=570.172.08

  • quay.io/deepgram/self-hosted-license-proxy:release-260416

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-license-proxy:1.10.1
  • quay.io/deepgram/self-hosted-billing:release-260416

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-billing:1.13.0

This Release Contains The Following Changes

  • Flux Multilingual — Real-time multilingual conversational STT is now available for self-hosted deployments. Supports 10 languages (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) with code-switching support. Deploying Flux Multilingual requires setting model_name = "flux-general-multi" in the [flux] section of engine.toml. See Flux Multilingual for details.
  • General Improvements — Keeps our software up-to-date.

April 15, 2026

Deepgram CLI Is Now Available

The Deepgram CLI brings transcription, speech synthesis, text analysis, account management, and MCP tooling to your terminal through a single dg command.

What you can do

Use the dg CLI to work with Deepgram from your terminal:

  • Transcribe files, URLs, microphone input, and piped audio
  • Generate speech with Deepgram Aura voices
  • Run text analysis workflows such as summarization, sentiment, and topic detection
  • Manage projects, API keys, members, and usage
  • Start an MCP server for AI coding tools

Launch docs

Start here:

For the launch site and quick reference, visit cli.deepgram.com.


April 15, 2026

Flux Multilingual: Conversational STT, Now in 10 Languages

The same model that solved turn detection for English voice agents now works everywhere your customers speak — no language routing, no model-per-language infrastructure, no accuracy tradeoff.

Deepgram is proud to announce the general availability of Flux Multilingual (flux-general-multi), a single model supporting 10 languages with the same turn-aware, interruption-aware conversational intelligence as flux-general-en.

Key Features:

  • 10 languages, one model — English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. No language routing or model-per-language infrastructure required.
  • Language prompting — The optional language_hint parameter biases the model toward specific languages, delivering accuracy on par with dedicated monolingual models. Without hints, the model auto-detects the spoken language.
  • Native code-switching — Handles mid-sentence language switches without configuration changes or reconnections.
  • Language detection on every turn — All TurnInfo events include a languages field reporting detected languages sorted by word count, and a languages_hinted field reflecting the active hints.
  • Mid-stream reconfiguration — Update language hints during a stream using the Configure control message without disconnecting. Supports patterns like detect-then-lock for optimal accuracy.
  • Same Flux architecture — All turn detection, eager end-of-turn, and configuration parameters from flux-general-en work identically.

Use Cases:

Designed for non-English monolingual voice agents, multilingual voice agents, global contact centers, bilingual support lines, and any real-time conversational application where callers may speak different languages. For English-only workloads, continue using flux-general-en.

Getting Started:

Connect to Flux Multilingual by setting model=flux-general-multi on the /v2/listen endpoint — no new credentials or endpoints required. Pricing is the same as flux-general-en.

wss://api.deepgram.com/v2/listen?model=flux-general-multi&language_hint=en&encoding=linear16&sample_rate=16000

Learn more in the Language Prompting guide, Flux Quickstart, and API Reference.

Availability

Flux Multilingual is now available through our API. To access:

  • Connect to wss://api.deepgram.com/v2/listen using model=flux-general-multi
  • EU endpoint available: wss://api.eu.deepgram.com/v2/listen?model=flux-general-multi
  • Real-time streaming only
  • SDK and self-hosted support coming soon

April 8, 2026

Reusable agent configurations

You can now store and manage agent configurations and template variables through the Deepgram API. Instead of sending a full agent configuration with every WebSocket session, define it once and reference it by UUID.

Key use cases include:

  • Per-customer configurations — Give each customer a distinct voice, persona, or model without maintaining separate codebases.
  • Regional and regulatory compliance — Maintain separate configurations for different markets to enforce data-handling, language, or disclosure requirements.
  • A/B testing voices or prompts — Run two configurations in parallel and measure conversion, CSAT, or containment rate without a code deploy.
  • Multi-agent architectures — Store and manage all agents used in your multi-agent architecture from a single project.

Template variables let you define reusable values (such as system prompts or model names) that are automatically interpolated at runtime. Variables follow the DG_<VARIABLE_NAME> naming format.

For more details, see Reusable Agent Configurations.


April 3, 2026

NVIDIA LLM provider now available

NVIDIA is now a supported LLM provider for the Voice Agent API. Two models are available in the Standard pricing tier:

Set the provider type to nvidia in your agent configuration:

1{
2 "agent": {
3 "think": {
4 "provider": {
5 "type": "nvidia",
6 "model": "llama-nemotron-super-49B",
7 "temperature": 0.7
8 }
9 }
10 }
11}

NVIDIA is a managed provider, so the endpoint field is optional. For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.


April 2, 2026

April 2, 2026

Deepgram Self-Hosted April 2026 Release (260402)

Container Images (release 260402)

  • quay.io/deepgram/self-hosted-api:release-260402

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-api:1.181.3
  • quay.io/deepgram/self-hosted-engine:release-260402

    • Equivalent image to:

      • quay.io/deepgram/self-hosted-engine:3.114.5
    • Minimum required NVIDIA driver version: >=570.172.08

  • quay.io/deepgram/self-hosted-license-proxy:release-260402

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-license-proxy:1.10.1
  • quay.io/deepgram/self-hosted-billing:release-260402

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-billing:1.13.0

This Release Contains The Following Changes

  • Certificate Endpoint Fix — Engine now responds to /v1/certificates in addition to /certificates, consistent with the other container images. See Certificate Status for details.
  • Model Name Consistency — The /v1/models endpoint now returns a canonical_name field matching the model name used in /v1/listen requests.
  • General Improvements — Keeps our software up-to-date.

April 1, 2026

New thought_signature field for Gemini function calling

The Voice Agent API now includes an optional thought_signature field in function call messages. Some Gemini models (3.0 and 3.1 families) require this as an additional function call identifier.

This field appears in two places:

  • Settings message — in agent.context.messages[].function_calls[] when providing function call history
  • FunctionCallRequest — in functions[] when the server requests a function call

Example

1{
2 "type": "FunctionCallRequest",
3 "functions": [
4 {
5 "id": "fc_12345678-90ab-cdef-1234-567890abcdef",
6 "name": "get_weather",
7 "arguments": "{\"location\": \"Fremont, CA 94539\"}",
8 "client_side": true,
9 "thought_signature": "abc123"
10 }
11 ]
12}

The thought_signature field is optional and only relevant when using Google Gemini models. This change addresses the degraded function calling performance that some users experienced with the Gemini 3.0 and 3.1 model families.

For more details, see the Function Call Request documentation, the Voice Agent API Reference, or Gemini’s Thought Signatures Documentation.

New volume parameter for Cartesia TTS

The Voice Agent API now supports an optional agent.speak.provider.volume parameter when using Cartesia as the TTS provider. Valid values range from 0.5 to 2.0.

For more details, see Configure the Voice Agent or the Cartesia volume, speed, and emotion documentation.


March 31, 2026

Nova-3 Model Update

🌏 Nova-3 now supports the following new languages and language codes:

  • Chinese (Mandarin, Simplified): zh, zh-CN, zh-Hans
  • Chinese (Mandarin, Traditional): zh-TW, zh-Hant

Access these models by setting model="nova-3" and the relevant language code in your request.

Learn more about Nova-3 and supported languages on the Models and Language Overview page.


March 26, 2026

TTS speed controls & updated LLM models

TTS speak speed (Early Access)

You can now control the speaking rate of Deepgram TTS in the Voice Agent API using the agent.speak.provider.speed parameter. This parameter accepts a float value between 0.7 and 1.5, with 1.0 as the default.

1{
2 "type": "Settings",
3 "agent": {
4 "speak": {
5 "provider": {
6 "type": "deepgram",
7 "model": "aura-2-thalia-en",
8 "speed": 0.9
9 }
10 }
11 }
12}

This feature is in Early Access and is only available for Deepgram TTS. For more details, see TTS voice controls. To request access, contact your Account Executive or reach out to sales@deepgram.com.

Updated LLM models

New OpenAI models — Two new models are now available in the Standard pricing tier:

  • gpt-5.4-nano
  • gpt-5.4-mini

Gemini 2.0 Flash deprecated — The gemini-2.0-flash model is now deprecated. We recommend migrating to gemini-2.5-flash or a newer Gemini model. See the Google models table for alternatives.

For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.