Speech-to-Text

November 7, 2025

Expanded Language Detection: Now Supports 35 Languages for Pre-Recorded Audio

Language Detection for pre-recorded (batch) audio now supports 35 languages (previously 16), allowing you to automatically detect and transcribe a much broader array of languages with one request. Enable broader coverage by setting detect_language=true in your API call.

Supported languages:

Bulgarian (bg)
Catalan (ca)
Czech (cs)
Danish (da)
German (de)
German (Switzerland) (de-CH)
Greek (el)
English (en)
Spanish (es)
Estonian (et)
Finnish (fi)
French (fr)
Hindi (hi)
Hungarian (hu)
Indonesian (id)
Italian (it)
Japanese (ja)
Korean (ko)
Lithuanian (lt)
Latvian (lv)
Malay (ms)
Dutch (nl)
Flemish (nl-BE)
Norwegian (no)
Polish (pl)
Portuguese (pt)
Romanian (ro)
Russian (ru)
Slovak (sk)
Swedish (sv)
Thai (th)
Turkish (tr)
Ukrainian (uk)
Vietnamese (vi)
Chinese (zh)

For the full list and up-to-date documentation, see the Language Detection documentation.

November 5, 2025

Legacy Intelligence Features Deprecated

We have deprecated several legacy Intelligence feature parameters. As of today, requests using these legacy parameters will return HTTP 400 errors. Please update your implementation to use the current parameters.

What’s Changed

Sentiment Analysis

Legacy: analyze_sentiment=true
Current: sentiment=true
Action Required: Update your API requests to use sentiment=true

Topic Detection

Legacy: detect_topics=true
Current: topics=true
Action Required: Update your API requests to use topics=true

Summarization

Legacy: summarize=v1 and V1 response structure
Current: summarize=true or summarize=v2 (both return V2 structure)
Action Required:
- Remove any usage of summarize=v1
- Update response parsing for summarize=true to use V2 structure
- V2 returns a single summary object across all channels with result and short properties
- V1 returned a summaries array with summary, start_word, and end_word per channel
Documentation Resources
Sentiment Analysis
Topic Detection
Summarization

For questions or support, please reach out through our developer support channels.

November 4, 2025

Nova-3 Model Update

🎯 Nova-3 supports 11 new languages

We’ve added support for 11 new languages with non-English monolingual Nova-3 models. This continues our effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

Bulgarian (bg)
Czech (cs)
Finnish (fi)
Hindi (hi)
Hungarian (hu)
Japanese (ja)
Korean (ko, ko-KR)
Polish (pl)
Russian (ru)
Ukrainian (uk)
Vietnamese (vi)

Learn more about Nova-3 on the Models and Language Overview page.

October 16, 2025

Flux: Expanded Audio Format Support

Flux now supports a wider range of audio formats beyond linear16, giving you more flexibility in how you send audio to our conversational speech recognition model. This update adds support for additional raw audio encodings and containerized audio formats.

Raw Audio Format Support

Flux now accepts the following raw (non-containerized) audio encodings:

linear16 - 16-bit signed little-endian PCM (existing)
linear32 - 32-bit signed little-endian PCM (new)
mulaw - Mu-law encoding (new)
alaw - A-law encoding (new)
opus - Opus codec (new)
ogg-opus - Opus in Ogg container (new)

When using raw audio formats, you must specify both the encoding and sample_rate parameters in your API request.

Containerized Audio Support

Flux now also accepts containerized audio formats, eliminating the need to manually specify encoding and sample rate parameters:

WAV containers with linear16 encoding
Ogg containers with opus encoding

When sending containerized audio, omit the encoding and sample_rate parameters—Flux will automatically detect these from the container metadata.

Supported Sample Rates

For raw audio, Flux supports the following sample rates:

8000 Hz
16000 Hz (recommended)
24000 Hz
44100 Hz
48000 Hz

Implementation

For raw audio:

wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000

For containerized audio:

wss://api.deepgram.com/v2/listen?model=flux-general-en

For detailed information about Flux audio requirements, see our Flux documentation.

October 2, 2025

Introducing Flux: Conversational Speech Recognition to Solve the Biggest Problem in Voice Agents – Interruptions

Flux is here: the first real-time Conversational Speech Recognition model built for voice agents. Solve interruptions, cut latency, and ship natural conversations faster than ever.

Deepgram is proud to announce the release of Flux, the first speech recognition model that knows when someone is actually done talking. Built for conversation, not transcription.

Key Features:

Model-integrated turn detection that understands context and conversation flow, not just silence
Ultra-low latency when it matters most - transcripts that are ready as soon as turns end
Nova-3 level transcription quality - high-quality speech-to-text (STT) that’s every bit as accurate as the best models, including keyterm prompting
Radically simpler development - one API replaces complex STT+VAD+endpointing pipelines, and conversation-native events designed to make voice agent development a breeze
Configurability for your use case - the right amount of complexity, allowing developers to achieve the desired behavior for their agent

Use Cases:

Designed for real-time conversational applications: voice agents, AI assistants, IVR systems, and any application requiring natural dialogue flow. For pure transcription (meetings, captions, recordings), continue using Nova-3.

Getting Started:

Flux is free throughout October 2025 for our OktoberFLUX promotion (up to 50 concurrent connections). Use model=flux-general-en via the new /v2/listen endpoint.

Learn more in our Announcement Blog, Developer Documentation, API Reference, and try the Interactive Demo.

Availability

Flux English is now available through our API. To access:

Connect to wss://api.deepgram.com/v2/listen using model=flux-general-en (reference additional required params here
Available for hosted use
Real-time streaming only
Self-hosted support coming soon
English-only for now

September 16, 2025

Nova-3 Model Update

🎯 Nova-3 supports 4 new languages

We’ve added support for 4 new languages with non-English monolingual Nova-3 models. This continues our effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

Italian (it)
Turkish (tr)
Norwegian (no)
Indonesian (id)

Learn more about Nova-3 on the Models and Language Overview page.

September 2, 2025

Nova-3 Model Update

🎯 Nova-3 supports 3 new languages

We’ve added support for 3 new languages with non-English monolingual Nova-3 models. This continues our effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

Spanish (es, es-419)
French (fr, fr-CA)
Portuguese (pt, pt-BR, pt-PT)

Learn more about Nova-3 on the Models and Language Overview page.

August 15, 2025

Nova-3 Model Update

🎯 Nova-3 supports 4 new languages

We’ve added support for 4 new languages with non-English monolingual Nova-3 models. This is the beginning of an effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

German (de)
Dutch (nl)
Swedish (sv, sv-SE)
Danish (da, da-DK)

Learn more about Nova-3 at Models and Language Overview page

June 11, 2025

Profanity Filtering Expanded Language Support

Profanity Filtering Gets Expanded Language Support

Profanity filtering now supports 6 additional languages beyond English, giving you content moderation capabilities across your global user base. Available on monolingual models for:

Newly Supported:

German (de)
Swiss German (de-CH)
Polish (pl)
Portuguese (pt, pt-BR, pt-PT)
Spanish (es, es-419)
Swedish (sv, sv-SE)

Existing English Support:

en, en-US, en-AU, en-GB, en-NZ, en-IN

This expansion lets you deploy consistent content policies across international markets without building custom filtering logic.

Smart Formatting Improvements

We’ve resolved several high-impact formatting edge cases that were causing transcription accuracy issues in production environments:

Improved Entity Formatting via Smart Format

Email Transcription Improvements

Fixed: 'o' characters in email addresses now transcribe correctly instead of converting to '0'
Fixed: edge case email mentions that were being dropped entirely in specific batch processing scenarios

Certain formerly numeric-only sequences have been updated to correctly preserve all alphanumeric characters:

Before (some entities): "my account number is a b c d zero nine" → "my account number is 09"
After (some entities): "my account number is a b c d zero nine" → "my account number is ABCD09"

Quantity modifiers (‘single’, ‘double’, ‘triple’ + standalone character or number) are better handled via Smart Format:

Before (some entities): "double 2" → "2"
After (some entities): "double 2" → "22"

Special cases of ‘hundred’ or ‘a hundred’ now supported via Smart Format:

Before (some entities): "hundred percent" → "%"
After (some entities): "hundred percent" → "100%"

This update has gone out to all hosted streaming transcription, and will be applied to our next self-hosted release later this month.

May 21, 2025

Nova-3 Medical Streaming Significant Upgrade

We’ve just released a significant upgrade to Nova-3 Medical Streaming, bringing substantial improvements in accuracy for real-time medical transcription use cases. This update focuses specifically on our streaming model, delivering better performance across key transcription metrics.

Performance Improvements

**11% relative reduction in Overall WER **compared to Nova-3 general streaming model
**30% relative reduction in Overall WER **compared to Nova-2 Medical streaming model
**2.7x improvement in Keyword Recall Rate (KRR) **compared to Nova-3 general streaming model
Maintains industry-leading inference speed with ultra-low latency for real-time healthcare applications

Availability

The updated Nova-3 Medical streaming model is now available through our API. To access:

Use model=nova-3-medical in your streaming API calls
Available for hosted use
Self-hosted deployments will be available in subsequent releases
English only

For details on the original Nova-3 Medical release (including batch capabilities), check out the original changelog: Introducing Nova-3 Medical. For detailed information about Nova-3 Medical, please refer to our Developer Documentation.

Expanded Language Detection: Now Supports 35 Languages for Pre-Recorded Audio

Legacy Intelligence Features Deprecated

What’s Changed

Sentiment Analysis

Topic Detection

Summarization

Documentation Resources

Nova-3 Model Update

🎯 Nova-3 supports 11 new languages

Flux: Expanded Audio Format Support

Raw Audio Format Support

Containerized Audio Support

Supported Sample Rates

Implementation

Introducing Flux: Conversational Speech Recognition to Solve the Biggest Problem in Voice Agents – Interruptions

Flux is here: the first real-time Conversational Speech Recognition model built for voice agents. Solve interruptions, cut latency, and ship natural conversations faster than ever.

Availability

Nova-3 Model Update

🎯 Nova-3 supports 4 new languages

Nova-3 Model Update

🎯 Nova-3 supports 3 new languages

Nova-3 Model Update

🎯 Nova-3 supports 4 new languages

Profanity Filtering Expanded Language Support

Profanity Filtering Gets Expanded Language Support

Smart Formatting Improvements

Improved Entity Formatting via Smart Format

Nova-3 Medical Streaming Significant Upgrade

Performance Improvements

Availability