Optimize Voice Agent Latency with Eager End of Turn
Reduce end-to-end latency by preparing responses early with Eager End of Turn events.
Reduce end-to-end latency by preparing responses early with Eager End of Turn events.
Eager end of turn processing is the practice of starting LLM processing on medium-confidence transcripts (EagerEndOfTurn events) before waiting for a high-confidence EndOfTurn. By overlapping LLM generation with user speech, you can cut hundreds of milliseconds from your agent’s response time.
EagerEndOfTurn and TurnResumed events are ONLY triggered if you have configured the eager_eot_threshold in your connection string.
Receive EagerEndOfTurn
If TurnResumed occurs
EagerEndOfTurn or EndOfTurn.If EndOfTurn occurs
EndOfTurn transcript will exactly match the EagerEndOfTurn transcript, ensuring consistent transcription throughout the turn lifecycle.EndOfTurn events.EagerEndOfTurn and EndOfTurn events.eager_eot_threshold: Lower values → earlier triggers, but more false starts.eot_threshold: Higher values → more reliable EndOfTurn, but may increase latency.TurnResumed GracefullyTurnResumed as a cancellation signal.EndOfTurn.EagerEndOfTurn outputs to draft, not finalize.TurnResumed events when the user continues speaking.EndOfTurn.EndOfTurn (transcript guaranteed to match).EagerEndOfTurn → TurnResumed vs. EagerEndOfTurn → EndOfTurn.StartOfTurn events, which disrupt the turn lifecycle. For guidance on echo cancellation, noise suppression, and barge-in strategies, see Audio Preprocessing & Barge-In.This code demonstrates how to handle Flux message events to implement eager end-of-turn processing. The examples show message parsing and event handling for the three critical events: -
EagerEndOfTurn (start preparing response)TurnResumed (cancel draft response)EndOfTurn (finalize and deliver response)EndOfTurn detection sufficiently fast enough to support natural conversation, but not all voice AI workflows are the same.EagerEndOfTurnto call LLMs speculatively, i.e., in preparation for an upcoming turn end, in order to minimize response latency.