This guide shows you how to configure endpointing and interim results to control transcript delivery timing in your streaming application.
Endpointing detects pauses in speech and returns speech_final: true when a pause is detected. Use this to trigger downstream processing when a speaker stops talking.
endpointing parameter to a millisecond value in your WebSocket connection:speech_final: true:Recommended values:
endpointing=false: Disable pause detection entirelyInterim results provide preliminary transcripts as audio streams in, marked with is_final: false. When Deepgram reaches maximum accuracy for a segment, it sends a finalized transcript with is_final: true.
interim_results=true in your WebSocket connection:is_final flag:
is_final: false — Preliminary transcript, may changeis_final: true — Finalized transcript for this audio segmentWhen using both features together, concatenate finalized transcripts to build complete utterances.
Append each is_final: true transcript to a buffer.
When speech_final: true arrives, the buffer contains the complete utterance.
Clear the buffer and start collecting the next utterance.
The following example shows how is_final and speech_final interact when a speaker dictates a credit card number:
On line 5, is_final: true indicates a finalized transcript, but speech_final: false means the speaker hasn’t paused yet. On line 7, both flags are true, signaling the end of an utterance. To get the complete transcript, concatenate lines 5 and 7.
Do not use speech_final: true alone to capture full transcripts. Long utterances may have multiple is_final: true responses before speech_final: true is returned.
For applications requiring complete sentences, add timing-based segmentation on top of endpointing.
Process only is_final: true responses.
Break utterances at punctuation terminators or when the gap between adjacent words exceeds your threshold.
Your configuration is working correctly when:
speech_final: true arrive after detected pausesis_final: false) update in real-time as audio streamsis_final: true) contain accurate text for each segmentis_final: true responses until speech_final: true