Understanding the Flux State Machine

Traditional STT+VAD requires you to build complex interruption logic. Flux handles this natively.

Emitted events adhere to the below state machine for managing turns:

  1. Update messages are sent for approximately every 0.25 seconds of transcribed audio, regardless of transcript updates, unless a state change has occurred.
  2. An EagerEndOfTurn message always contains a nonempty transcript.
  3. A TurnResumed message can only follow a preceding EagerEndOfTurn message.
  4. The EndOfTurn transcript may not always match the preceding EagerEndOfTurn transcript.
    • This occurs ~1% of the time outside of purely punctuation changes. A robust implementation should check for significant transcript changes and retrigger a new LLM reply.
  5. The turn_index increments immediately following an EndOfTurn message.

Turn Lifecycle Example

Here’s how Flux processes a customer calling support saying “Hi I need to cancel my subscription please.”

Notice how confidence builds up and how the EagerEndOfTurn event fires before the final EndOfTurn. With EagerEndOfTurn, your voice agent can begin preparing a response before the user has fully finished speaking. This allows you to send a synchronous request with early context, creating the effect of a faster, more natural reply.

1{
2 "event": "Update",
3 "turn_index": 0,
4 "audio_window_start": 0.0,
5 "audio_window_end": 0.2,
6 "transcript": "",
7 "words": [],
8 "end_of_turn_confidence": 0.1
9}
10
11{
12 "event": "Update",
13 "turn_index": 0,
14 "audio_window_start": 0.0,
15 "audio_window_end": 0.5,
16 "transcript": "",
17 "words": [],
18 "end_of_turn_confidence": 0.1
19}
20
21{
22 "event": "StartOfTurn",
23 "turn_index": 0,
24 "audio_window_start": 0.0,
25 "audio_window_end": 0.6,
26 "transcript": "Hi I",
27 "words": [
28 {
29 "word": "Hi",
30 "confidence": 0.95
31 },
32 {
33 "word": "I",
34 "confidence": 0.92
35 }
36 ],
37 "end_of_turn_confidence": 0.1
38}
39
40{
41 "event": "Update",
42 "turn_index": 0,
43 "audio_window_start": 0.0,
44 "audio_window_end": 0.8,
45 "transcript": "Hi I need to",
46 "words": [...],
47 "end_of_turn_confidence": 0.1
48}
49
50{
51 "event": "Update",
52 "turn_index": 0,
53 "audio_window_start": 0.0,
54 "audio_window_end": 1.0,
55 "transcript": "Hi I need to cancel my subscription.",
56 "words": [...],
57 "end_of_turn_confidence": 0.3
58}
59
60{
61 "event": "EagerEndOfTurn",
62 "turn_index": 0,
63 "audio_window_start": 0.0,
64 "audio_window_end": 1.1,
65 "transcript": "Hi I need to cancel my subscription.",
66 "words": [...],
67 "end_of_turn_confidence": 0.3
68}
69
70{
71 "event": "TurnResumed",
72 "turn_index": 0,
73 "audio_window_start": 0.0,
74 "audio_window_end": 1.2,
75 "transcript": "Hi I need to cancel my subscription please",
76 "words": [...],
77 "end_of_turn_confidence": 0.1
78}
79
80{
81 "event": "Update",
82 "turn_index": 0,
83 "audio_window_start": 0.0,
84 "audio_window_end": 1.4,
85 "transcript": "Hi I need to cancel my subscription please.",
86 "words": [...],
87 "end_of_turn_confidence": 0.3
88}
89
90{
91 "event": "EagerEndOfTurn",
92 "turn_index": 0,
93 "audio_window_start": 0.0,
94 "audio_window_end": 1.5,
95 "transcript": "Hi I need to cancel my subscription please.",
96 "words": [...],
97 "end_of_turn_confidence": 0.3
98}
99
100{
101 "event": "Update",
102 "turn_index": 0,
103 "audio_window_start": 0.1,
104 "audio_window_end": 1.6,
105 "transcript": "Hi I need to cancel my subscription please.",
106 "words": [...],
107 "end_of_turn_confidence": 0.5
108}
109
110{
111 "event": "EndOfTurn",
112 "turn_index": 0,
113 "audio_window_start": 0.0,
114 "audio_window_end": 1.7,
115 "transcript": "Hi I need to cancel my subscription please.",
116 "words": [...],
117 "end_of_turn_confidence": 0.7
118}