Introducing Flux: Conversational Speech Recognition to Solve the Biggest Problem in Voice Agents – Interruptions

Flux is here: the first real-time Conversational Speech Recognition model built for voice agents. Solve interruptions, cut latency, and ship natural conversations faster than ever.

Deepgram is proud to announce the release of Flux, the first speech recognition model that knows when someone is actually done talking. Built for conversation, not transcription.

Key Features:

  • Model-integrated turn detection that understands context and conversation flow, not just silence
  • Ultra-low latency when it matters most - transcripts that are ready as soon as turns end
  • Nova-3 level transcription quality - high-quality speech-to-text (STT) that’s every bit as accurate as the best models, including keyterm prompting
  • Radically simpler development - one API replaces complex STT+VAD+endpointing pipelines, and conversation-native events designed to make voice agent development a breeze
  • Configurability for your use case - the right amount of complexity, allowing developers to achieve the desired behavior for their agent

Use Cases:

Designed for real-time conversational applications: voice agents, AI assistants, IVR systems, and any application requiring natural dialogue flow. For pure transcription (meetings, captions, recordings), continue using Nova-3.

Getting Started:

Flux is free throughout October 2025 for our OktoberFLUX promotion (up to 50 concurrent connections). Use model=flux-general-en via the new /v2/listen endpoint.

Learn more in our Announcement Blog, Developer Documentation, API Reference, and try the Interactive Demo.

Availability

Flux English is now available through our API. To access:

  • Connect to wss://api.deepgram.com/v2/listen using model=flux-general-en (reference additional required params here
  • Available for hosted use
  • Real-time streaming only
  • Self-hosted support coming soon
  • English-only for now