Build a Flux-enabled Voice Agent
Build a Flux-enabled Voice Agent
Build a cascaded voice agent using Flux conversational speech to text, an OpenAI LLM, and Deepgram Aura-2 text to speech.
Build a Flux-enabled Voice Agent
Build a cascaded voice agent using Flux conversational speech to text, an OpenAI LLM, and Deepgram Aura-2 text to speech.
Flux tackles the most critical challenges for voice agents today: knowing when to listen, when to think, and when to speak. The model features first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines, all with Nova-3 level accuracy.
If you’d prefer to skip building, managing, and scaling a voice agent yourself — explore our Voice Agent API.
This guide walks you through building a basic voice agent powered by Deepgram Flux, OpenAI, and Deepgram TTS—streaming speech-to-text with advanced turn detection—to create natural, real-time conversations with users.
This walkthrough uses flux-general-en and an English Aura voice to keep the example focused. To make the
agent multilingual, switch the STT model to flux-general-multi and apply language_hint values as shown in
Flux Multilingual & Language Prompting. The rest of the pipeline stays the
same.
By the end of this guide, you’ll have:
Flux supports the use of any LLM you wish to use. So you can use the best LLM for your use case. For this demo we’ll be using OpenAI.
For this demo will opt to use EndOfTurn only for simplicity.
Flux enables two voice agent patterns. You can decide which one to use based on your latency vs complexity/cost tradeoffs.
EndOfTurn OnlyConsiderations:
We recommend starting with a purely EndOfTurn driven implementation to get up and running. This means:
Update/EagerEndOfTurn/TurnResumed: Use only for transcript referenceEndOfTurn: Send transcript to LLM and trigger agent responseStartOfTurn: Interrupt agent if speaking, otherwise waitIf you’re experiencing echo (the agent responding to itself) or false barge-ins from background noise, see Audio Preprocessing & Barge-In for recommendations on echo cancellation, noise suppression, and using Flux’s StartOfTurn for reliable barge-in detection.
For more information EagerEndOfTurn see our guide Optimize Voice Agent Latency with Eager End of Turn
Considerations:
Once comfortable with End of Turn, you can decide if you need to optimize latency using EagerEndOfTurn. Eager end of turn processing sends medium-confidence transcripts to your LLM before final EndOfTurn certainty, reducing response time. Though consider the LLM trade offs you might need to make.
EagerEndOfTurn: Start preparing agent reply (moderate confidence user finished speaking)TurnResumed: Cancel agent reply preparation (user still speaking)EndOfTurn: Proceed with prepared response (user definitely finished)StartOfTurn: Interrupt agent if speaking, otherwise waitTuning Turn Detection: You can fine-tune the behavior of these events using the eot_threshold, eager_eot_threshold, and eot_timeout_ms parameters. See the End-of-Turn Configuration for detailed tuning guidance and use-case specific recommendations.
Dynamic Tuning: In production voice agents powered by Flux, you can use the Configure control message to adjust these thresholds, or keyterms, mid-stream as desired behavior changes throughout a conversation.
Using the Voice Agent API, your pipeline will look like this:
If you want to use Flux with the Voice Agent API set your listen.provider.model to flux-general-en, or flux-general-multi for multilingual agents (with optional language_hint values). See Multilingual Voice Agents for setup details.
If you opt to build your own voice agent from scratch, you can use Flux to handle the speech to text and rely on its turn-taking cues to coordinate the rest of your pipeline.
You’ll now be responsible for:
EndOfTurn Only Voice Agent ExampleHere’s a sample voice agent implementation using Flux with the EndOfTurn only pattern:
Install the additional dependencies:
.env fileCreate a .env file in your project root with your Deepgram API key and OpenAI API Key.
Replace your_deepgram_api_key with your actual Deepgram API key.
Replace your_open_ai_api_key with your actual Open API key.
Here’s the complete working example that combines all the steps. You can also find this code on GitHub.
For additional demos showcasing Flux, check out the following repositories: