We’ve released an upgraded Nova-3 Medical batch model with improved medical term recognition.
Key Improvements:
model=nova-3-medical in your batch transcription requests.Learn more about our models and supported languages on the Models & Languages Overview page.
quay.io/deepgram/self-hosted-api:release-260528
quay.io/deepgram/self-hosted-api:1.188.1quay.io/deepgram/self-hosted-engine:release-260528
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.117.0Minimum required NVIDIA driver version: >=570.172.08
quay.io/deepgram/self-hosted-license-proxy:release-260528
quay.io/deepgram/self-hosted-license-proxy:1.10.1quay.io/deepgram/self-hosted-billing:release-260528
quay.io/deepgram/self-hosted-billing:1.13.0The official Helm chart (0.37.0 and later) and the Docker and Podman compose files in deepgram/self-hosted-resources now set NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=compute,utility on the Engine container. These env vars are no-ops with the release-260528 Engine image but are required for an upcoming Engine container refactor; deployments that adopt them now will not need a configuration change when that refactor ships. If you maintain your own deployment manifests, adding these env vars to the Engine container is safe to do at any time.
profanity_filter=true now masks recognized profanity in STT multilingual transcripts (language=multi). See Profanity Filtering for the supported language list and usage.ko, ko-KR) were sometimes missing spaces between words. Transcripts now better reflect proper Korean spacing.gemini-3.5-flash is now available as a managed Google LLM in the Voice Agent API. This Standard tier model brings improved performance and efficiency to your voice agents.
Set the model in your agent configuration:
The Gemini 2.5 Flash family of models is deprecating in October. Start testing newer models now to ensure a smooth migration.
For more details on Gemini model deprecations, see Google’s Gemini deprecations page.
For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
Deepgram’s Profanity Filtering feature is now available for all multilingual models: Nova-2 multilingual, Nova-3 multilingual, and Flux multilingual (language=multi). You can enable profanity filtering in your API requests by setting the profanity_filter=true parameter. When enabled, inappropriate language is automatically replaced with asterisks (****) in the transcript.
This extends profanity filtering beyond single-language models, making it easier to process and moderate content in multilingual scenarios.
Learn more about using Profanity Filtering and see the full list of supported languages on the Profanity Filtering documentation page.
We fixed an issue affecting Korean transcripts (ko, ko-KR) where word spacing was sometimes missing. Transcripts should now better reflect proper Korean spacing, improving readability for users working with Korean audio.
See the full list of supported languages on the Models & Languages Overview page.
gemini-3.1-flash-lite is now available as a managed Google LLM in the Voice Agent API. This Standard tier model replaces the preview version.
Set the model in your agent configuration:
gemini-3.1-flash-lite-preview is deprecated and will be removed on May 26, 2025. Migrate to gemini-3.1-flash-lite.For more details on Gemini model deprecations, see Google’s Gemini deprecations page.
For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
ru)ro)he)You can now use Deepgram’s Numerals feature with monolingual models for Russian, Romanian, and Hebrew. Numerals converts spoken numbers into digits (for example, “three hundred” → “300”) in your transcript, helping you create more accurate and easily processed results.
How to use Numerals:
To enable numerals, add the numerals=true parameter to your Deepgram API request.
Learn more about using Numerals and see the full list of supported languages on the Numerals documentation page.
quay.io/deepgram/self-hosted-api:release-260514
quay.io/deepgram/self-hosted-api:1.187.0quay.io/deepgram/self-hosted-engine:release-260514
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.117.0Minimum required NVIDIA driver version: >=570.172.08
quay.io/deepgram/self-hosted-license-proxy:release-260514
quay.io/deepgram/self-hosted-license-proxy:1.10.1quay.io/deepgram/self-hosted-billing:release-260514
quay.io/deepgram/self-hosted-billing:1.13.0Release 260514 ships Deepgram’s new batch diarization model (v2) to self-hosted. New deployments provisioned through your Deepgram representative will receive only the v2 batch diarizer model on disk by default. To produce diarized output on a fresh deployment, batch requests must specify diarize_model=v2 or diarize_model=latest. diarize=true on its own is pinned to v1; on a 260514 deployment that does not have the v1 model on disk, /v1/listen?diarize=true returns a successful response with no speaker labels — consistent with Deepgram’s longstanding behavior when a requested diarizer model is not present.
Existing deployments retain their v1 batch diarizer and continue to work without changes. To add v2 to an existing deployment, contact your Deepgram representative.
diarize_model Parameter — Opt into v2 by passing diarize_model=v2 (pin to v2) or diarize_model=latest (recommended; auto-upgrades to future diarizer iterations) on pre-recorded /v1/listen requests. Unrecognized values return 400 Bad Request. Streaming requests reject diarize_model and return 400; use diarize=true for streaming diarization. diarize=true on batch continues to route to v1 to preserve behavior for existing integrations.We’re excited to announce the release of profanity filtering support for over 50 monolingual languages. Deepgram’s profanity filter automatically detects and redacts offensive language in transcripts, helping you produce cleaner and safer content across a wide range of languages.
To enable profanity filtering, add the profanity_filter=true parameter to your Deepgram API request:
For more details, supported languages, and additional options, visit the Profanity Filter page.
diarize_model API parameter.Deepgram is rolling out v2 of our batch speaker diarization model. v2 is a new architecture available today on an opt-in basis through the new diarize_model parameter. In side-by-side human evaluation, v2 was preferred 3.3× over our current production diarizer (v1), with the largest gains on contact-center audio — median CER reduced roughly 80% compared to the prior version of the diarization model. Customers using diarize=true are unaffected.
Key Features:
diarize_model parameter — A single parameter that both enables diarization and selects the version. Most customers should choose latest; v2 or v1 are also accepted.diarize_model=latest auto-upgrades — Resolves to the newest GA diarizer. Today that’s v2.diarize=true continues to route to v1.New diarize_model parameter:
The new diarize_model parameter enables diarization and selects the model version in a single parameter — no need to also set diarize=true:
Migration guidance:
diarize_model=latest. To pin a specific version, use diarize_model=v2 or diarize_model=v1.diarize=true users: No breaking changes — your existing requests continue to work with v1. To pick up v2’s improvements, update your requests to diarize_model=latest (always newest) or diarize_model=v2. We recommend testing on a representative sample of your audio before flipping production traffic.No pricing changes. Diarization continues to be included at current rates.
/v1/listen endpoint, on both US-hosted and EU-hosted endpointsdiarize_model is not accepted on streaming requests and returns 400. Use diarize=true for streaming diarization. Streaming improvements ship separately.Learn more in the Speaker Diarization documentation.
We’ve enhanced the Nova-3 Portuguese model with improved transcription accuracy across Portuguese language variants, including Brazilian Portuguese (pt-BR) and European Portuguese (pt-PT).
To use the updated model, set model="nova-3" and use one of the supported Portuguese language codes:
language="pt"language="pt-BR"language="pt-PT"Learn more about Nova-3 and supported languages on the Models and Language Overview page.
A new round of SDK updates is now available across JavaScript, Rust, Python, and Java. This release brings Flux multilingual support to Rust, restores the Agent interface in JavaScript, ships a Python bugfix for WebSocket query parameters, and delivers a breaking Java release with reconnect improvements.
Deepgram JavaScript SDK v5.2.0 is now available. This release restores the Agent interface and adds AgentReference for string-ID flows, aliases AgentV1SettingsAgentListenProvider to AgentContextListenProvider, and preserves AgentV1Settings.Agent sub-types so existing agent code continues to compile.
For release details, see deepgram-js-sdk v5.2.0.
Deepgram Rust SDK 0.10.0 is now available. This release adds Flux multilingual support with Model::FluxGeneralMulti, OptionsBuilder::language_hint for BCP-47 language hints, and new TurnInfo fields (languages and languages_hinted). It also introduces mid-session reconfiguration via FluxHandle::configure(ConfigureRequest) for adjusting thresholds, keyterms, and language hints without restarting the WebSocket.
This release includes a breaking change: FluxResponse::TurnInfo is now #[non_exhaustive].
For release details, see deepgram-rust-sdk 0.10.0.
Deepgram Python SDK v7.1.1 is now available. This patch release fixes boolean query parameters on WebSocket connect, which are now lowercased to match what the API expects.
For release details, see deepgram-python-sdk v7.1.1.
Deepgram Java SDK v0.4.0 is now available. This release ships reconnect and listener bug fixes, adds a transport factory policy hook for customizing transport behavior (timeouts, proxies, TLS) without subclassing the client, and incorporates the latest API surface updates.
This release includes breaking changes. For the full release notes, see deepgram-java-sdk v0.4.0.