Flux Voice Agent
Flux tackles the most critical challenges for voice agents today: knowing when to listen, when to think, and when to speak. The model features first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines, all with Nova-3 level accuracy.
If youβd prefer to skip building, managing, and scaling a voice agent yourself β explore our Voice Agent API.
Letβs Build!
This guide walks you through building a basic voice agent powered by Deepgram Flux, OpenAI, and Deepgram TTSβstreaming speech-to-text with advanced turn detectionβto create natural, real-time conversations with users.
By the end of this guide, youβll have:
- A real-time voice agent with sub-second response times
- A voice agent that uses a static audio file for mocking out a conversation
- Natural conversation flow with Fluxβs advanced turn detection model
- Voice Activity Detection based interruption handling for responsive interactions
- A complete setup ready for a demo deployment
Choosing an LLM
Flux supports the use of any LLM you wish to use. So you can use the best LLM for your use case. For this demo weβll be using OpenAI.
Voice Agent Patterns
For this demo will opt to use EndOfTurn
only for simplicity.
Flux enables two voice agent patterns. You can decide which one to use based on your latency vs complexity/cost tradeoffs.
EndOfTurn
Only
Considerations:
We recommend starting with a purely EndOfTurn
driven implementation to get up and running. This means:
Update
/EagerEndOfTurn
/TurnResumed
: Use only for transcript referenceEndOfTurn
: Send transcript to LLM and trigger agent responseStartOfTurn
: Interrupt agent if speaking, otherwise wait
EagerEndOfTurn + EndOfTurn
For more information EagerEndOfTurn
see our guide Optimize Voice Agent Latency with Eager End of Turn
Considerations:
Once comfortable with End of Turn, you can decide if you need to optimize latency using EagerEndOfTurn
. Eager end of turn processing sends medium-confidence transcripts to your LLM before final EndOfTurn
certainty, reducing response time. Though consider the LLM trade offs you might need to make.
EagerEndOfTurn
: Start preparing agent reply (moderate confidence user finished speaking)TurnResumed
: Cancel agent reply preparation (user still speaking)EndOfTurn
: Proceed with prepared response (user definitely finished)StartOfTurn
: Interrupt agent if speaking, otherwise wait
Voice Agent vs Flux Agent Pipeline
Using the Voice Agent API your pipeline will look like this:
If you want to use Flux with the Voice Agent API set your listen.provider.model
toflux-general-en
In comparison, Flux only handles the STT processing β everything else becomes modular and under your control.
Youβll now be responsible for:
- Managing audio playback interruptions (barge-in)
- Sending STT output to your LLM
- Cancelling LLM responses if user resumes talking
- Converting LLM output to speech via your chosen TTS provider
EndOfTurn
Only Voice Agent Example
Hereβs a sample voice agent implementation using Flux with the EndOfTurn
only pattern:
1. Install the Deepgram SDK
2. Add Dependencies
Install the additional dependencies:
3. Create a .env
file
Create a .env
file in your project root with your Deepgram API key and OpenAI API Key.
Replace your_deepgram_api_key
with your actual Deepgram API key.
Replace your_open_ai_api_key
with your actual Open API key.
4. Set Imports & Audio File
5. Transcribe with Flux
6. Generate OpenAI Response
7. Generate TTS Response
8. Save TTS Audio
8. Complete Code Example
Hereβs the complete working example that combines all the steps. You can also find this code on GitHub.
Additional Flux Demos
For additional demos showcasing Flux, check out the following repositories: