Configure Endpointing and Interim Results

This guide shows you how to configure endpointing and interim results to control transcript delivery timing in your streaming application.

Configure endpointing for pause detection

Endpointing detects pauses in speech and returns speech_final: true when a pause is detected. Use this to trigger downstream processing when a speaker stops talking.

Set the endpointing parameter to a millisecond value in your WebSocket connection:

Python

1 with client.listen.v1.connect(
2     model="nova-3",
3     language="en-US",
4     endpointing=300  # 300ms of silence triggers speech_final
5 ) as connection:

Handle responses where speech_final: true:

JSON

1 {
2   "is_final": true,
3   "speech_final": true,
4   "channel": {
5     "alternatives": [{
6       "transcript": "another big"
7     }]
8   }
9 }

Recommended values:

10ms (default): Fast response for chatbots expecting short utterances
300-500ms: Better for conversations where speakers pause mid-thought
endpointing=false: Disable pause detection entirely

Enable interim results for real-time feedback

Interim results provide preliminary transcripts as audio streams in, marked with is_final: false. When Deepgram reaches maximum accuracy for a segment, it sends a finalized transcript with is_final: true.

Set interim_results=true in your WebSocket connection:

Python

1 with client.listen.v1.connect(
2     model="nova-3",
3     language="en-US",
4     interim_results=True,
5     endpointing=300
6 ) as connection:

Process responses based on the is_final flag:
- is_final: false — Preliminary transcript, may change
- is_final: true — Finalized transcript for this audio segment

Combine both features for complete utterances

When using both features together, concatenate finalized transcripts to build complete utterances.

Enable both features in your WebSocket connection:

Python

1 with client.listen.v1.connect(
2     model="nova-3",
3     language="en-US",
4     interim_results=True,
5     endpointing=300
6 ) as connection:

Append each is_final: true transcript to a buffer.
When speech_final: true arrives, the buffer contains the complete utterance.
Clear the buffer and start collecting the next utterance.

The following example shows how is_final and speech_final interact when a speaker dictates a credit card number:

JSON

1 1 0.000-1.100 ["is_final": false] ["speech_final": false] yeah so
2 2 0.000-2.200 ["is_final": false] ["speech_final": false] yeah so my credit card number
3 3 0.000-3.200 ["is_final": false] ["speech_final": false] yeah so my credit card number is two two
4 4 0.000-4.300 ["is_final": false] ["speech_final": false] yeah so my credit card number is two two two two three
5 5 0.000-3.260 ["is_final": true ] ["speech_final": false] yeah so my credit card number is two two
6 6 3.260-5.100 ["is_final": false] ["speech_final": false] two two three three three three
7 7 3.260-5.500 ["is_final": true ] ["speech_final": true ] two two three three three three

On line 5, is_final: true indicates a finalized transcript, but speech_final: false means the speaker hasn’t paused yet. On line 7, both flags are true, signaling the end of an utterance. To get the complete transcript, concatenate lines 5 and 7.

Do not use speech_final: true alone to capture full transcripts. Long utterances may have multiple is_final: true responses before speech_final: true is returned.

Implement utterance segmentation

For applications requiring complete sentences, add timing-based segmentation on top of endpointing.

Enable punctuation in your WebSocket connection:

Python

1 with client.listen.v1.connect(
2     model="nova-3",
3     language="en-US",
4     interim_results=True,
5     endpointing=300,
6     punctuate=True
7 ) as connection:

Process only is_final: true responses.
Break utterances at punctuation terminators or when the gap between adjacent words exceeds your threshold.

Verify your configuration

Your configuration is working correctly when:

Responses with speech_final: true arrive after detected pauses
Interim results (is_final: false) update in real-time as audio streams
Finalized transcripts (is_final: true) contain accurate text for each segment
Complete utterances can be reconstructed by concatenating is_final: true responses until speech_final: true

Next steps

Endpointing reference — Full parameter documentation
Interim Results reference — Detailed response format
Understanding End of Speech Detection — Related speech detection features