Interim Results
Interim Results provides preliminary results for streaming audio.
interim_results
boolean. Default: false
Deepgram’s Interim Results monitors streaming audio and provides interim transcripts, which are preliminary results provided during the real-time streaming process which can help with speech detection.
Deepgram will identify a point at which its transcript has reached maximum accuracy and send a definitive, or final, transcript of all audio up to that point. It will then continue to process audio.
When working with real-time streaming audio, streams flow from your capture source (for example, microphone, browser, or telephony system) to Deepgram's servers in irregular pieces. In some cases the collected audio can end abruptly—perhaps even mid-word—which means that Deepgram’s predictions, particularly for words near the tip of the audio stream, are more likely to be wrong.
When Interim Results is enabled Deepgram guesses about the words being spoken and sends these guesses to you as interim transcripts. As more audio enters the server, Deepgram corrects and improves the transcriptions, increasing its accuracy, until it reaches the end of the stream, at which point it sends one last, cumulative transcript.
Interim Results can be used with Deepgram's Endpointing feature. To compare and contrast these features, and to explore best practices for using them together, see Using Endpointing and Interim Results with Live Streaming Audio.
Enable Feature
To enable Interim Results, when you call Deepgram’s API, add an interim_results
parameter set to true
in the query string:
interim_results=true
# see https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/async_microphone/main.py
# for complete example code
options: LiveOptions = LiveOptions(
model="nova-2",
language="en-US",
# Apply smart formatting to the output
smart_format=True,
# Raw audio format details
encoding="linear16",
channels=1,
sample_rate=16000,
# To get UtteranceEnd, the following must be set:
interim_results=True,
utterance_end_ms="1000",
vad_events=True,
# Time in milliseconds of silence to wait for before finalizing speech
endpointing=300
)
Analyze Interim Transcripts
Let’s look at some interim transcripts and analyze their content.
Our first interim result has the following content:
{
"channel_index": [
0,
1
],
"duration": 1.039875,
"start": 0,
"is_final": false,
"channel": {
"alternatives": [
{
"transcript": "another big",
"confidence": 0.9600255,
"words": [
{
"word": "another",
"start": 0.2971154,
"end": 0.7971154,
"confidence": 0.9588303
},
{
"word": "big",
"start": 0.85173076,
"end": 1.039875,
"confidence": 0.9600255
}
]
}
]
}
}
In this response, we see that:
start
(the number of seconds into the audio stream) is0.0
, indicating that this is the very beginning of the real-time stream.start
+duration
(the entire length of this response) is1.039875
seconds, and the word "big" ends at1.039875
seconds (which matches theduration
value), indicating that the stream cuts off the word.confidence
for the word "big" is approximately 96%, indicating that even though the word is cut off, Deepgram is still pretty certain that its prediction is correct.is_final
isfalse
, indicating that Deepgram will continue waiting to see if more data will improve its predictions.
The next interim response has the following content:
{
"channel_index": [
0,
1
],
"duration": 2.039875,
"start": 0,
"is_final": false,
"channel": {
"alternatives": [
{
"transcript": "another big problem",
"confidence": 0.9939844,
"words": [
{
"word": "another",
"start": 0.29852942,
"end": 0.7985294,
"confidence": 0.9939844
},
{
"word": "big",
"start": 0.8557843,
"end": 1.3557843,
"confidence": 0.98220366
},
{
"word": "problem",
"start": 1.5722549,
"end": 2.039875,
"confidence": 0.9953441
}
]
}
]
}
}
In this response, we see that:
start
(the number of seconds into the audio stream) is 0, indicating that this is the very beginning of the real-time stream.start
+duration
(the entire length of this response) is2.039875
seconds, and the word "problem" ends at2.039875
seconds (which matches theduration
value), indicating that the stream cuts off the word.confidence
for the word "big" has improved to almost 98%.- the
end
timestamp for "big" now indicates that the word has not been cut off. confidence
for the word "problem" is almost 100%, so can likely be trusted.is_final
isfalse
, indicating that Deepgram will continue waiting to see if more data will improve its predictions.
For a more detailed example of using Interim results refer to Using Interim Results Tips & Tricks.
Updated 9 months ago