Configure Endpointing and Interim Results
This guide shows you how to configure endpointing and interim results to control transcript delivery timing in your streaming application.
Configure endpointing for pause detection
Endpointing detects pauses in speech and returns speech_final: true when a pause is detected. Use this to trigger downstream processing when a speaker stops talking.
- Set the
endpointingparameter to a millisecond value in your WebSocket connection:
- Handle responses where
speech_final: true:
Recommended values:
- 10ms (default): Fast response for chatbots expecting short utterances
- 300-500ms: Better for conversations where speakers pause mid-thought
endpointing=false: Disable pause detection entirely
Enable interim results for real-time feedback
Interim results provide preliminary transcripts as audio streams in, marked with is_final: false. When Deepgram reaches maximum accuracy for a segment, it sends a finalized transcript with is_final: true.
- Set
interim_results=truein your WebSocket connection:
- Process responses based on the
is_finalflag:is_final: false— Preliminary transcript, may changeis_final: true— Finalized transcript for this audio segment
Combine both features for complete utterances
When using both features together, concatenate finalized transcripts to build complete utterances.
- Enable both features in your WebSocket connection:
-
Append each
is_final: truetranscript to a buffer. -
When
speech_final: truearrives, the buffer contains the complete utterance. -
Clear the buffer and start collecting the next utterance.
The following example shows how is_final and speech_final interact when a speaker dictates a credit card number:
On line 5, is_final: true indicates a finalized transcript, but speech_final: false means the speaker hasn’t paused yet. On line 7, both flags are true, signaling the end of an utterance. To get the complete transcript, concatenate lines 5 and 7.
Do not use speech_final: true alone to capture full transcripts. Long utterances may have multiple is_final: true responses before speech_final: true is returned.
Implement utterance segmentation
For applications requiring complete sentences, add timing-based segmentation on top of endpointing.
- Enable punctuation in your WebSocket connection:
-
Process only
is_final: trueresponses. -
Break utterances at punctuation terminators or when the gap between adjacent words exceeds your threshold.
Verify your configuration
Your configuration is working correctly when:
- Responses with
speech_final: truearrive after detected pauses - Interim results (
is_final: false) update in real-time as audio streams - Finalized transcripts (
is_final: true) contain accurate text for each segment - Complete utterances can be reconstructed by concatenating
is_final: trueresponses untilspeech_final: true
Next steps
- Endpointing reference — Full parameter documentation
- Interim Results reference — Detailed response format
- Understanding End of Speech Detection — Related speech detection features