Getting Started with Flux
Flux is the first conversational speech recognition model built specifically for voice agents. Unlike traditional STT that just transcribes words, Flux understands conversational flow and automatically handles turn-taking.
Flux tackles the most critical challenges for voice agents today: knowing when to listen, when to think, and when to speak. The model features first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines, all with Nova-3 level accuracy.
Flux is Perfect for: turn-based voice agents, customer service bots, phone assistants, and real-time conversation tools.
Key Benefits:
- Smart turn detection β Knows when speakers finish talking
- Ultra-low latency β ~260ms end-of-turn detection
- Early LLM responses β
EagerEndOfTurn
events for faster replies - Turn-based transcripts β Clean conversation structure
- Natural interruptions β Built-in barge-in handling
- Nova-3 accuracy β Best-in-class transcription quality
For more information on how Flux manages turns, see the Flux State Machine Guide guide.
Important: Flux Connection Requirements
Flux requires the /v2/listen
endpoint β Using /v1/listen
will not work with Flux.
When connecting to Flux, you must use:
- Endpoint:
/v2/listen
(not/v1/listen
) - Model:
flux-general-en
- Audio Format: See Audio Format Requirements table below
- Chunk Size: 80ms audio chunks strongly recommended for optimal model performance and latency
Audio Format Requirements
WebSocket URL Format:
When using the Deepgram SDK, use client.listen.v2.connect()
to access the v2 endpoint. For direct WebSocket connections, ensure youβre using /v2/listen
in your URL.
Configurable Parameters
Flux provides three key parameters to control end-of-turn detection behavior and optimize your voice agentβs conversational flow:
End-of-Turn Detection Parameters
When to Configure These Parameters
For most use cases, the default eot_threshold=0.7
works well. You only need to configure these parameters if:
- You want faster responses: Set
eager_eot_threshold
to enableEagerEndOfTurn
events and start LLM processing before the user fully finishes speaking - Your users speak with long pauses: Increase
eot_timeout_ms
to avoid cutting off turns prematurely - You need more reliable turn detection: Increase
eot_threshold
to reduce false positives (at the cost of slightly higher latency) - You want more aggressive turn detection: Lower
eot_threshold
to trigger turns earlier
Important: Setting eager_eot_threshold
enables EagerEndOfTurn
and TurnResumed
events. These events allow you to start preparing LLM responses early, reducing end-to-end latency by hundreds of milliseconds. See the Eager End-of-Turn Optimization Guide for implementation strategies.
Cost Consideration: Using EagerEndOfTurn
can increase LLM API calls by 50-70% due to speculative response generation. The TurnResumed
event signals when to cancel a draft response because the user continued speaking.
For comprehensive parameter documentation and tuning guidance, see the End-of-Turn Configuration.
Using Flux: SDK vs Direct WebSocket
Common Mistakes to Avoid:
- β Using
/v1/listen
instead of/v2/listen
- β Using
model=flux
instead ofmodel=flux-general-en
- β Using
language=en
parameter (usemodel=flux-general-en
instead) - β Specifying
encoding
orsample_rate
when sending containerized audio (omit these for containerized formats)
Letβs Build!
This guide walks you through building a basic streaming transcription application powered by Deepgram Flux and the Deepgram SDK.
By the end of this guide, youβll have:
- A real-time streaming transcription application with sub-second response times using the BBC Real Time Live Stream as your audio.
- Natural conversation flow with Fluxβs advanced turn detection model
- Voice Activity Detection based interruption handling for responsive interactions
- A working demo you can build on!
Audio Stream
To handle the audio stream will be using the following conversion approach:
1. Install the Deepgram SDK
2. Add Dependencies
Install the additional dependencies:
3. Install FFMPEG
on your machine
You will need the actual FFmpeg binary installed to run this demo:
- macOS:
brew install ffmpeg
- Ubuntu/Debian:
sudo apt install ffmpeg
- Windows:
Download from https://ffmpeg.org/
4. Create a .env
file
Create a .env
file in your project root with your Deepgram API key:
Replace your_deepgram_api_key
with your actual Deepgram API key.
4. Set Imports and Set Audio Stream Colors
Core Dependencies:
asyncio
- Handles concurrent audio streaming and Deepgram connectionsubprocess
- Manages FFmpeg process for audio conversiondotenv
- Loads Deepgram API key from.env
file
Deepgram SDK:
AsyncDeepgramClient
- Main client for Flux API connectionEventType
- WebSocket event constants (OPEN, MESSAGE, CLOSE, ERROR)ListenV2SocketClientResponse
- Type hints for incoming transcription messages
Configuration:
STREAM_URL
- BBC World Service streaming audio endpoint
Visual Feedback System:
Colors
class - ANSI terminal color codes for confidence visualizationget_confidence_color()
- Maps confidence scores to colors:- Green (0.90-1.00): High confidence
- Yellow (0.80-0.90): Good confidence
- Orange (0.70-0.80): Lower confidence
- Red (β€0.69): Low confidence
Purpose: Sets up the foundation for real-time streaming transcription with visual quality indicators, making it easy to spot transcription accuracy at a glance.
5. Connect to Flux and Process Audio
The main function orchestrates real-time transcription of streaming audio URLs:
- Initialize: Creates
AsyncDeepgramClient
and connects to Flux with required linear16 format - Event Handling: Sets up message handler that displays transcriptions with color-coded confidence scores
- Audio Pipeline: Launches FFmpeg subprocess to convert compressed stream URL to
linear16
PCM format - Streaming Loop: Reads converted audio chunks and pipes them to Deepgram Flux connection
- Concurrent Tasks: Runs Deepgram listener and audio conversion simultaneously using asyncio
- Error Handling: Manages FFmpeg errors and connection timeouts (60s default)
The function handles both the audio conversion requirement (Flux only accepts linear16
) and real-time streaming coordination between multiple async processes.
6. Complete Code Example
Hereβs the complete working example that combines all the steps. You can also find this code on GitHub.
Additional Flux Demos
For additional demos showcasing Flux, check out the following repositories:
Building a Voice Agent with Flux
Are you ready to build a voice agent with Flux? See our Build a Flux-enabled Voice Agent Guide to get started.