Flux: Expanded Audio Format Support

Flux now supports a wider range of audio formats beyond linear16, giving you more flexibility in how you send audio to our conversational speech recognition model. This update adds support for additional raw audio encodings and containerized audio formats.

Raw Audio Format Support

Flux now accepts the following raw (non-containerized) audio encodings:

  • linear16 - 16-bit signed little-endian PCM (existing)
  • linear32 - 32-bit signed little-endian PCM (new)
  • mulaw - Mu-law encoding (new)
  • alaw - A-law encoding (new)
  • opus - Opus codec (new)
  • ogg-opus - Opus in Ogg container (new)

When using raw audio formats, you must specify both the encoding and sample_rate parameters in your API request.

Containerized Audio Support

Flux now also accepts containerized audio formats, eliminating the need to manually specify encoding and sample rate parameters:

  • WAV containers with linear16 encoding
  • Ogg containers with opus encoding

When sending containerized audio, omit the encoding and sample_rate parameters—Flux will automatically detect these from the container metadata.

Supported Sample Rates

For raw audio, Flux supports the following sample rates:

  • 8000 Hz
  • 16000 Hz (recommended)
  • 24000 Hz
  • 44100 Hz
  • 48000 Hz

Implementation

For raw audio:

wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000

For containerized audio:

wss://api.deepgram.com/v2/listen?model=flux-general-en

For detailed information about Flux audio requirements, see our Flux documentation.


Introducing Flux: Conversational Speech Recognition to Solve the Biggest Problem in Voice Agents – Interruptions

Flux is here: the first real-time Conversational Speech Recognition model built for voice agents. Solve interruptions, cut latency, and ship natural conversations faster than ever.

Deepgram is proud to announce the release of Flux, the first speech recognition model that knows when someone is actually done talking. Built for conversation, not transcription.

Key Features:

  • Model-integrated turn detection that understands context and conversation flow, not just silence
  • Ultra-low latency when it matters most - transcripts that are ready as soon as turns end
  • Nova-3 level transcription quality - high-quality speech-to-text (STT) that’s every bit as accurate as the best models, including keyterm prompting
  • Radically simpler development - one API replaces complex STT+VAD+endpointing pipelines, and conversation-native events designed to make voice agent development a breeze
  • Configurability for your use case - the right amount of complexity, allowing developers to achieve the desired behavior for their agent

Use Cases:

Designed for real-time conversational applications: voice agents, AI assistants, IVR systems, and any application requiring natural dialogue flow. For pure transcription (meetings, captions, recordings), continue using Nova-3.

Getting Started:

Flux is free throughout October 2025 for our OktoberFLUX promotion (up to 50 concurrent connections). Use model=flux-general-en via the new /v2/listen endpoint.

Learn more in our Announcement Blog, Developer Documentation, API Reference, and try the Interactive Demo.

Availability

Flux English is now available through our API. To access:

  • Connect to wss://api.deepgram.com/v2/listen using model=flux-general-en (reference additional required params here
  • Available for hosted use
  • Real-time streaming only
  • Self-hosted support coming soon
  • English-only for now

Nova-3 Model Update

🎯 Nova-3 supports 4 new languages

We’ve added support for 4 new languages with non-English monolingual Nova-3 models. This continues our effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

  • Italian (it)
  • Turkish (tr)
  • Norwegian (no)
  • Indonesian (id)

Learn more about Nova-3 on the Models and Language Overview page.


Nova-3 Model Update

🎯 Nova-3 supports 3 new languages

We’ve added support for 3 new languages with non-English monolingual Nova-3 models. This continues our effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

  • Spanish (es, es-419)
  • French (fr, fr-CA)
  • Portuguese (pt, pt-BR, pt-PT)

Learn more about Nova-3 on the Models and Language Overview page.


Nova-3 Model Update

🎯 Nova-3 supports 4 new languages

We’ve added support for 4 new languages with non-English monolingual Nova-3 models. This is the beginning of an effort to significantly expand Nova-3 language support beyond English. The newly supported languages and their corresponding language codes are:

Newly Supported:

  • German (de)
  • Dutch (nl)
  • Swedish (sv, sv-SE)
  • Danish (da, da-DK)

Learn more about Nova-3 at Models and Language Overview page


Profanity Filtering Expanded Language Support

Profanity Filtering Gets Expanded Language Support

Profanity filtering now supports 6 additional languages beyond English, giving you content moderation capabilities across your global user base. Available on monolingual models for:

Newly Supported:

  • German (de)
  • Swiss German (de-CH)
  • Polish (pl)
  • Portuguese (pt, pt-BR, pt-PT)
  • Spanish (es, es-419)
  • Swedish (sv, sv-SE)

Existing English Support:

  • en, en-US, en-AU, en-GB, en-NZ, en-IN

This expansion lets you deploy consistent content policies across international markets without building custom filtering logic.

Smart Formatting Improvements

We’ve resolved several high-impact formatting edge cases that were causing transcription accuracy issues in production environments:

Improved Entity Formatting via Smart Format

Email Transcription Improvements

  • Fixed: 'o' characters in email addresses now transcribe correctly instead of converting to '0'
  • Fixed: edge case email mentions that were being dropped entirely in specific batch processing scenarios

Certain formerly numeric-only sequences have been updated to correctly preserve all alphanumeric characters:

  • Before (some entities): "my account number is a b c d zero nine""my account number is 09"
  • After (some entities): "my account number is a b c d zero nine""my account number is ABCD09"

Quantity modifiers (‘single’, ‘double’, ‘triple’ + standalone character or number) are better handled via Smart Format:

  • Before (some entities): "double 2""2"
  • After (some entities): "double 2""22"

Special cases of ‘hundred’ or ‘a hundred’ now supported via Smart Format:

  • Before (some entities): "hundred percent""%"
  • After (some entities): "hundred percent""100%"

This update has gone out to all hosted streaming transcription, and will be applied to our next self-hosted release later this month.


Nova-3 Medical Streaming Significant Upgrade

We’ve just released a significant upgrade to Nova-3 Medical Streaming, bringing substantial improvements in accuracy for real-time medical transcription use cases. This update focuses specifically on our streaming model, delivering better performance across key transcription metrics.

Performance Improvements

  • **11% relative reduction in Overall WER **compared to Nova-3 general streaming model
  • **30% relative reduction in Overall WER **compared to Nova-2 Medical streaming model
  • **2.7x improvement in Keyword Recall Rate (KRR) **compared to Nova-3 general streaming model
  • Maintains industry-leading inference speed with ultra-low latency for real-time healthcare applications

Availability

The updated Nova-3 Medical streaming model is now available through our API. To access:

  • Use model=nova-3-medical in your streaming API calls
  • Available for hosted use
  • Self-hosted deployments will be available in subsequent releases
  • English only

For details on the original Nova-3 Medical release (including batch capabilities), check out the original changelog: Introducing Nova-3 Medical. For detailed information about Nova-3 Medical, please refer to our Developer Documentation.


Smart Formatting Improvements

We’ve made some improvements to Smart Formatting, enhancing both streaming finalization behavior and entity recognition performance.

Streaming Finalization Improvements

Previously, when formatting entities like dates or credit card numbers, our models would sometimes wait for additional words before finalizing the transcript—particularly if the entity seemed incomplete. For example, when someone said “nineteen seventy…” Deepgram might pause, expecting a possible follow-up like “nine” or other additional speech before finalizing the complete year.

Now, instead of potentially waiting indefinitely for additional words, our system will finalize the transcript after 3 seconds of silence, and attempt to format the entity based on the available audio. This helps ensure transcripts are returned faster and more reliably—without sacrificing too much formatting precision.

Want more control over the finalization behavior? You have two options:

  • Implement logic to send a Finalize message earlier than the 3-second threshold. Reference our Finalize documentation here.
  • Set no_delay=true to override formatting and force immediate finalization. NOTE: This will result in skipping formatting altogether in many cases.

Enhanced Entity Formatting

In addition to the timeout improvements, we’ve refined our Named Entity Recognition model for Smart Formatting to better identify and format:

  • Date variations
  • Alphanumerics (order numbers, membership IDs, prescription numbers, etc.)
  • Currencies
  • Payment and card information
  • SSNs
  • Time zones

This update is automatically applied to all streaming transcription using Smart Formatting, and is included in our Self-Hosted March 2025 Release (250331). For more details, check out our Smart Formatting documentation.


Nova-3 Multilingual General Availability - Real-Time Code-Switching

Deepgram is proud to announce the general availability of Nova-3 Multilingual, the first model of its kind able to codeswitch in real-time across 10 different languages. This revolutionary capability unlocks a host of new possibilities for global operations by processing multilingual conversations instantly with a single model—an industry-first breakthrough that changes the game for speech recognition.

Multilingual Support

  • Real-time multilingual speech recognition with a truly unified speech recognition system

  • Supports code-switching between 10 languages:

    • English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch
  • Seamlessly handles natural language transitions without relying on explicit routing or language-specific mechanisms

  • Maintains high transcription accuracy across languages while adapting to natural language transitions

  • Developed through a multi-stage training process combining synthetic code-switched data at massive scale with carefully curated real-world datasets

Use Cases

Nova-3 Multilingual represents a significant breakthrough for applications in:

  • Global customer support
  • Emergency response (e.g., 911 calls)
  • Multilingual meetings
  • Retail interactions
  • Healthcare settings

In high-stakes scenarios like emergency response, Nova-3 can fluidly process interactions where callers switch between languages (e.g., Spanish and English) in real time, ensuring dispatchers receive accurate, immediate transcriptions without missing critical details.

Availability

  • Now available through our API
  • Use model=nova-3&language=multi in your API calls
  • Supports both pre-recorded and real-time streaming transcription
  • Available for hosted and self-hosted use

For detailed information about Nova-3 Multilingual, please refer to our Developer Documentation.


Expanded Numerals Language Support

Deepgram is excited to announce expanded language support for numerals through our numerals and smart_format parameters, providing more comprehensive coverage for converting written numbers to numerical format across additional languages.

Expanded Language Support

  • New languages added to Numerals support:
    • Danish: dada-DK
    • Dutch: nl
    • French: frfr-CA
    • German: de
    • German (Switzerland): de-CH
    • Italian: it
    • Norwegian: no
    • Polish: pl
    • Portuguese: ptpt-BRpt-PT
    • Spanish: eses-419
    • Swedish: svsv-SE

Previously supported language:

  • English: en, en-US, en-AU, en-GB, en-NZ, en-IN

Feature Integration

  • All newly supported languages are fully integrated with our Smart Format feature
  • When using smart_format=true, numerals will automatically be applied for all supported languages
  • Individual control remains available through the dedicated numerals=true parameter

Availability

The expanded Numerals support is now available through our API for use with all Deepgram speech-to-text models.

  • Available for hosted and self-hosted usage.
  • Compatible with both pre-recorded and real-time streaming transcription

For detailed information about our expanded Numerals or Smart Formatting support, please refer to our Developer Documentation.