Understanding the Flux State Machine
Traditional STT+VAD requires you to build complex interruption logic. Flux handles this natively.
Emitted events adhere to the below state machine for managing turns:
Updatemessages are sent for approximately every 0.25 seconds of transcribed audio, regardless of transcript updates, unless a state change has occurred.- An
EagerEndOfTurnmessage always contains a nonempty transcript. - A
TurnResumedmessage can only follow a precedingEagerEndOfTurnmessage. - The
EndOfTurntranscript will always match the immediately precedingEagerEndOfTurntranscript. If the transcript changes after anEagerEndOfTurn, aTurnResumedevent will occur first. - The
turn_indexincrements immediately following anEndOfTurnmessage. - When using
flux-general-multi, allTurnInfoevents includelanguages(detected languages sorted by word count) andlanguages_hinted(active language hints). See Language Prompting.
Configuring Event Behavior: The EagerEndOfTurn and TurnResumed events are only triggered when you set the eager_eot_threshold parameter. The EndOfTurn event behavior is controlled by eot_threshold and eot_timeout_ms parameters. See the End-of-Turn Configuration for details on tuning these thresholds for your use case.
Barge-in and audio quality: Flux’s StartOfTurn event is the recommended way to trigger barge-in — it’s more reliable than an external VAD because every StartOfTurn is guaranteed to contain a non-empty transcript. For guidance on echo cancellation, noise suppression, and other audio preprocessing that affects turn detection, see Audio Preprocessing & Barge-In.
Turn Lifecycle Example
Here’s how Flux processes a customer calling support saying “Hi I need to cancel my subscription please.”
Notice how confidence builds up and how the EagerEndOfTurn event fires before the final EndOfTurn. With EagerEndOfTurn, your voice agent can begin preparing a response before the user has fully finished speaking. This allows you to send a synchronous request with early context, creating the effect of a faster, more natural reply.