May 15, 2026

Numerals Support Now Available for 3 New Languages: Russian, Romanian, and Hebrew (Monolingual Models)

Supported languages and language codes:

  • Russian (ru)
  • Romanian (ro)
  • Hebrew (he)

You can now use Deepgram’s Numerals feature with monolingual models for Russian, Romanian, and Hebrew. Numerals converts spoken numbers into digits (for example, “three hundred” → “300”) in your transcript, helping you create more accurate and easily processed results.

How to use Numerals:
To enable numerals, add the numerals=true parameter to your Deepgram API request.

Learn more about using Numerals and see the full list of supported languages on the Numerals documentation page.


May 14, 2026

May 14, 2026

Deepgram Self-Hosted May 2026 Release (260514)

Container Images (release 260514)

  • quay.io/deepgram/self-hosted-api:release-260514

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-api:1.187.0
  • quay.io/deepgram/self-hosted-engine:release-260514

    • Equivalent image to:

      • quay.io/deepgram/self-hosted-engine:3.117.0
    • Minimum required NVIDIA driver version: >=570.172.08

  • quay.io/deepgram/self-hosted-license-proxy:release-260514

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-license-proxy:1.10.1
  • quay.io/deepgram/self-hosted-billing:release-260514

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-billing:1.13.0

Batch Diarization v2 model delivery for new self-hosted deployments

Release 260514 ships Deepgram’s new batch diarization model (v2) to self-hosted. New deployments provisioned through your Deepgram representative will receive only the v2 batch diarizer model on disk by default. To produce diarized output on a fresh deployment, batch requests must specify diarize_model=v2 or diarize_model=latest. diarize=true on its own is pinned to v1; on a 260514 deployment that does not have the v1 model on disk, /v1/listen?diarize=true returns a successful response with no speaker labels — consistent with Deepgram’s longstanding behavior when a requested diarizer model is not present.

Existing deployments retain their v1 batch diarizer and continue to work without changes. To add v2 to an existing deployment, contact your Deepgram representative.

This Release Contains The Following Changes

  • Batch Diarization v2 — A new batch diarization model with significantly improved speaker labeling, preferred 3.3× over v1 in side-by-side human evaluation. Strongest gains on contact-center audio (~80% reduction in median Confusion Error Rate vs. v1, ~60% at p95). Compatible with Nova-1, Nova-2, Nova-3, plus enhanced and base batch models; monolingual and multilingual. Not compatible with Whisper. The API response format is unchanged from v1. Batch-only; streaming diarization is unchanged. See Speaker Diarization for details.
  • New diarize_model Parameter — Opt into v2 by passing diarize_model=v2 (pin to v2) or diarize_model=latest (recommended; auto-upgrades to future diarizer iterations) on pre-recorded /v1/listen requests. Unrecognized values return 400 Bad Request. Streaming requests reject diarize_model and return 400; use diarize=true for streaming diarization. diarize=true on batch continues to route to v1 to preserve behavior for existing integrations.
  • General Improvements — Keeps our software up-to-date.

May 14, 2026

Profanity Filtering Now Available in 50+ Languages

We’re excited to announce the release of profanity filtering support for over 50 monolingual languages. Deepgram’s profanity filter automatically detects and redacts offensive language in transcripts, helping you produce cleaner and safer content across a wide range of languages.

How to Use Profanity Filtering

To enable profanity filtering, add the profanity_filter=true parameter to your Deepgram API request:

For more details, supported languages, and additional options, visit the Profanity Filter page.


May 13, 2026

Diarization v2: Improved Batch Speaker Diarization

A new batch diarization model is available today via the diarize_model API parameter.

Deepgram is rolling out v2 of our batch speaker diarization model. v2 is a new architecture available today on an opt-in basis through the new diarize_model parameter. In side-by-side human evaluation, v2 was preferred 3.3× over our current production diarizer (v1), with the largest gains on contact-center audio — median CER reduced roughly 80% compared to the prior version of the diarization model. Customers using diarize=true are unaffected.

Key Features:

  • New diarize_model parameter — A single parameter that both enables diarization and selects the version. Most customers should choose latest; v2 or v1 are also accepted.
  • diarize_model=latest auto-upgrades — Resolves to the newest GA diarizer. Today that’s v2.
  • No breaking changesdiarize=true continues to route to v1.
  • Compatible with the rest of the platform — Works with Nova-1, Nova-2, Nova-3, enhanced, and base batch models (async and sync), monolingual and multilingual, alongside existing batch features.

New diarize_model parameter:

The new diarize_model parameter enables diarization and selects the model version in a single parameter — no need to also set diarize=true:

https://api.deepgram.com/v1/listen?model=nova-3&diarize_model=latest
ValueDescription
latestResolves to the newest GA diarizer
v2New improved batch diarizer
v1Original production diarizer

Migration guidance:

  • New integrations: For new projects we recommend diarize_model=latest. To pin a specific version, use diarize_model=v2 or diarize_model=v1.
  • Existing diarize=true users: No breaking changes — your existing requests continue to work with v1. To pick up v2’s improvements, update your requests to diarize_model=latest (always newest) or diarize_model=v2. We recommend testing on a representative sample of your audio before flipping production traffic.

No pricing changes. Diarization continues to be included at current rates.

Availability

  • Available now on the /v1/listen endpoint, on both US-hosted and EU-hosted endpoints
  • Supported on Nova-1, Nova-2, Nova-3, enhanced, and base batch models (async and sync), monolingual and multilingual
  • Streaming: diarize_model is not accepted on streaming requests and returns 400. Use diarize=true for streaming diarization. Streaming improvements ship separately.
  • Self-hosted support coming soon

Learn more in the Speaker Diarization documentation.

Nova-3 Portuguese Model Update

Improved Nova-3 Portuguese Model

We’ve enhanced the Nova-3 Portuguese model with improved transcription accuracy across Portuguese language variants, including Brazilian Portuguese (pt-BR) and European Portuguese (pt-PT).

To use the updated model, set model="nova-3" and use one of the supported Portuguese language codes:

  • language="pt"
  • language="pt-BR"
  • language="pt-PT"

Learn more about Nova-3 and supported languages on the Models and Language Overview page.


May 12, 2026

SDK releases

A new round of SDK updates is now available across JavaScript, Rust, Python, and Java. This release brings Flux multilingual support to Rust, restores the Agent interface in JavaScript, ships a Python bugfix for WebSocket query parameters, and delivers a breaking Java release with reconnect improvements.

JavaScript SDK v5.2.0

Deepgram JavaScript SDK v5.2.0 is now available. This release restores the Agent interface and adds AgentReference for string-ID flows, aliases AgentV1SettingsAgentListenProvider to AgentContextListenProvider, and preserves AgentV1Settings.Agent sub-types so existing agent code continues to compile.

For release details, see deepgram-js-sdk v5.2.0.

Rust SDK 0.10.0

Deepgram Rust SDK 0.10.0 is now available. This release adds Flux multilingual support with Model::FluxGeneralMulti, OptionsBuilder::language_hint for BCP-47 language hints, and new TurnInfo fields (languages and languages_hinted). It also introduces mid-session reconfiguration via FluxHandle::configure(ConfigureRequest) for adjusting thresholds, keyterms, and language hints without restarting the WebSocket.

This release includes a breaking change: FluxResponse::TurnInfo is now #[non_exhaustive].

For release details, see deepgram-rust-sdk 0.10.0.

Python SDK v7.1.1

Deepgram Python SDK v7.1.1 is now available. This patch release fixes boolean query parameters on WebSocket connect, which are now lowercased to match what the API expects.

For release details, see deepgram-python-sdk v7.1.1.

Java SDK v0.4.0

Deepgram Java SDK v0.4.0 is now available. This release ships reconnect and listener bug fixes, adds a transport factory policy hook for customizing transport behavior (timeouts, proxies, TLS) without subclassing the client, and incorporates the latest API surface updates.

This release includes breaking changes. For the full release notes, see deepgram-java-sdk v0.4.0.


May 12, 2026

Nova-3 Multilingual Model Update

Numerals Support Expanded for Nova-3 Multilingual

Numeral formatting is now supported for all Nova-3 multilingual languages — except Hindi and Japanese. This enhancement means Nova-3 multilingual can now convert spoken numbers to digits (e.g., “three hundred” → “300”) for English, Spanish, French, German, Russian, Portuguese, Italian, and Dutch.

To use this feature, set model="nova-3" and language="multi". Then include the numerals=true parameter in your request.

Learn more about how Numerals works and see supported languages on the Numerals page.


May 11, 2026

Browser Agent SDK

The Browser Agent SDK is now available — four composable packages that connect any web app to the Voice Agent API:

  • @deepgram/agents-widget — drop-in widget with six layouts (sidebar, floating, inline, button, embedded, or orb). No framework required.
  • @deepgram/ui — pre-built React components (conversation view, animated orb, mic/speaker controls, waveform visualizer) styled through CSS custom properties.
  • @deepgram/react — provider + hooks for state, conversation history, microphone control, audio playback, and client-side function calling.
  • @deepgram/agents — the framework-agnostic core: WebSocket client, microphone capture, and player.

Each layer builds on the one below it, so installing the higher layer pulls in everything beneath. All layers share the same reconnection logic, playback-aware mode tracking, audio buffering, optional Silero VAD, KeepAlive pings, and typed event emitter.

Install the widget and ship in minutes:

$npm install @deepgram/agents-widget
1import { init } from "@deepgram/agents-widget";
2
3init({
4 tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
5 agent: "YOUR_AGENT_ID",
6 layout: "floating",
7});

For the full architecture, package-by-package guides, and live in-page demos, see the Browser Agent SDK overview.

Voice Agent docs restructure

The Voice Agent section has been reorganized into five sections — Get Started, Build, Integrate, Reference, and Tips & Migration — to make it easier to find content based on where you are in your build. As part of the same pass, a few closely related reference pages have been merged (for example, prompt-updated, speak-updated, and think-updated are now consolidated into Acknowledgements, and the errors and warning pages are now Errors & Warnings). Redirects are in place, so existing links continue to work.


May 4, 2026

Aura-2 Voice Controls — Speed and Pronunciation

Aura-2 TTS voices now support runtime speed and pronunciation controls in English and Spanish, available on both batch and streaming WebSocket endpoints. Together, these controls give developers finer-grained tools to improve naturalness — tuning pacing and correcting pronunciation to match their needs.

Speed control adjusts the speaking rate of generated audio while maintaining natural prosody, with a supported range of 0.7x–1.5x. For Spanish voices, the recommended range is 0.9x–1.5x; values below 0.9x may introduce disfluencies.

Pronunciation control overrides the default pronunciation of specific words using inline IPA notation, e.g. The patient was prescribed {"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"}.

See Voice Controls for details.

Voice Agent

These controls are also supported when using Aura-2 inside the Voice Agent API: set speed once at the session level, and have the LLM emit pronunciation overrides inline. See Voice Agent TTS Controls for the recommended setup.

Self-Hosted

These controls are powered by an updated Aura-2 voice-pack model. Self-hosted deployments using a voice-pack from before the April 2026 release will return 400 Bad Request on requests including speed or pronounce parameters. See the April 2026 self-hosted release notes for voice-pack version requirements and upgrade instructions.


April 30, 2026

April 30, 2026

Deepgram Self-Hosted April 2026 Release (260430)

Container Images (release 260430)

  • quay.io/deepgram/self-hosted-api:release-260430

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-api:1.185.0-2
  • quay.io/deepgram/self-hosted-engine:release-260430

    • Equivalent image to:

      • quay.io/deepgram/self-hosted-engine:3.116.0-1
    • Minimum required NVIDIA driver version: >=570.172.08

  • quay.io/deepgram/self-hosted-license-proxy:release-260430

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-license-proxy:1.10.1
  • quay.io/deepgram/self-hosted-billing:release-260430

    • Equivalent image to:
      • quay.io/deepgram/self-hosted-billing:1.13.0

Aura-2 Speed and Pronunciation Controls require an updated voice-pack

The new Aura-2 Speed and Pronunciation Control features in this release are powered by an updated Aura-2 English voice-pack model. If your deployment is using an Aura-2 English voice-pack from before the April 2026 release (e.g., the 2025-04-15.0 version of the voice-pack), requests including the speed or pronounce parameters will return 400 Bad Request.

To enable these features, contact your Deepgram representative to obtain the latest Aura-2 English voice-pack (2025-04-15.4 or later) and replace the existing voice-pack file in your models directory. The official Deepgram Helm chart and sample values files in deepgram/self-hosted-resources (chart 0.34.0 and later) already point to the correct UUID; you only need to use the latest Deepgram configuration files and update the model file on disk.

This Release Contains The Following Changes

  • Nova-3 Gujarati — Nova-3 now supports Gujarati (gu) for both batch and streaming.
  • Aura-2 Speed and Pronunciation Controls — Aura-2 TTS voices now support runtime speed and pronunciation control. See Voice Controls for details.
  • Improved Aura-2 Pronunciation — Better pronunciation for Spanish dates and the term “Jan” (as a name versus a month) with Aura-2 voices.
  • Nova-3 Multilingual Numeral Formatting — Numeral formatting is now applied when using Nova-3 multilingual models and smart_format or numerals is enabled.
  • Numeral Formatting for Hebrew and Romanian — Numeral formatting is now applied for Nova-3 Hebrew (he) and Romanian (ro) when smart_format or numerals is enabled.
  • Voice Agent: Cartesia Speed Control — The Cartesia speak provider now supports speed control in Voice Agent sessions.
  • Voice Agent: Improved Agent Message Injection — Improved support for injecting agent messages into a live session. See Inject Agent for details.
  • Voice Agent: Multilingual Flux Language Hints — Multilingual Flux now accepts language hints when used as the STT provider in a Voice Agent session.
  • Improved Multilingual Streaming Language Tags — Improves the accuracy of language tag results on /v1/listen streaming requests using multilingual models.
  • Improved Numeral Redaction Accuracy — Improved redaction accuracy when using redact=numbers or redact=aggressive_numbers.
  • General Improvements — Keeps our software up-to-date.

April 29, 2026

LLM Model Updates & Cartesia Speed Control

GPT-5.5 LLM Model Support

OpenAI’s GPT-5.5 model is now available as a managed LLM in the Voice Agent API. GPT-5.5 is an Advanced tier model.

Set the model in your agent configuration:

1{
2 "agent": {
3 "think": {
4 "provider": {
5 "type": "open_ai",
6 "model": "gpt-5.5",
7 "temperature": 0.7
8 }
9 }
10 }
11}

For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.

Llama Nemotron Super 49B Removed

The llama-nemotron-super-49B (Llama Nemotron Super 49B) model has been removed from the NVIDIA provider due to poor performance. The nemotron-3-nano-30B-A3B model remains available. See the NVIDIA models table for current options.

Cartesia TTS Speed Control

The agent.speak.provider.speed parameter now supports Cartesia TTS in addition to Deepgram TTS. For Cartesia, the parameter accepts the following preset values:

  • slowest
  • slow
  • normal
  • fast
  • fastest

You can also pass a numerical value for more granular control. See the Cartesia speed documentation for details.

1{
2 "agent": {
3 "speak": {
4 "provider": {
5 "type": "cartesia",
6 "model_id": "sonic-2",
7 "voice": {
8 "mode": "id",
9 "id": "a167e0f3-df7e-4d52-a9c3-f949145efdab"
10 },
11 "speed": "fast"
12 }
13 }
14 }
15}

For more details, see the TTS Models documentation.