Understanding the Flux State Machine
Traditional STT+VAD requires you to build complex interruption logic. Flux handles this natively.
Emitted events adhere to the below state machine for managing turns:
Update
messages are sent for approximately every 0.25 seconds of transcribed audio, regardless of transcript updates, unless a state change has occurred.- An
EagerEndOfTurn
message always contains a nonempty transcript. - A
TurnResumed
message can only follow a precedingEagerEndOfTurn
message. - The
EndOfTurn
transcript may not always match the precedingEagerEndOfTurn
transcript.- This occurs ~1% of the time outside of purely punctuation changes. A robust implementation should check for significant transcript changes and retrigger a new LLM reply.
- The
turn_index
increments immediately following anEndOfTurn
message.
Turn Lifecycle Example
Here’s how Flux processes a customer calling support saying “Hi I need to cancel my subscription please.”
Notice how confidence builds up and how the EagerEndOfTurn
event fires before the final EndOfTurn
. With EagerEndOfTurn
, your voice agent can begin preparing a response before the user has fully finished speaking. This allows you to send a synchronous request with early context, creating the effect of a faster, more natural reply.