Interim Results
interim_results boolean. Default: false
Deepgram’s Interim Results monitors streaming audio and provides interim transcripts, which are preliminary results provided during the real-time streaming process which can help with speech detection.
Deepgram will identify a point at which its transcript has reached maximum accuracy and send a definitive, or final, transcript of all audio up to that point. It will then continue to process audio.
When working with real-time streaming audio, streams flow from your capture source (for example, microphone, browser, or telephony system) to Deepgram’s servers in irregular pieces. In some cases the collected audio can end abruptly—perhaps even mid-word—which means that Deepgram’s predictions, particularly for words near the tip of the audio stream, are more likely to be wrong.
When Interim Results is enabled Deepgram guesses about the words being spoken and sends these guesses to you as interim transcripts. As more audio enters the server, Deepgram corrects and improves the transcriptions, increasing its accuracy, until it reaches the end of the stream, at which point it sends one last, cumulative transcript.
Interim Results can be used with Deepgram’s Endpointing feature. To compare and contrast these features, and to explore best practices for using them together, see Using Endpointing and Interim Results with Live Streaming Audio.
Enable Feature
To enable Interim Results, when you call Deepgram’s API, add an interim_results parameter set to true in the query string:
interim_results=true
Analyze Interim Transcripts
Let’s look at some interim transcripts and analyze their content.
Our first interim result has the following content:
In this response, we see that:
start(the number of seconds into the audio stream) is0.0, indicating that this is the very beginning of the real-time stream.start+duration(the entire length of this response) is1.039875seconds, and the word “big” ends at1.039875seconds (which matches thedurationvalue), indicating that the stream cuts off the word.confidencefor the word “big” is approximately 96%, indicating that even though the word is cut off, Deepgram is still pretty certain that its prediction is correct.is_finalisfalse, indicating that Deepgram will continue waiting to see if more data will improve its predictions.
The next interim response has the following content:
In this response, we see that:
start(the number of seconds into the audio stream) is 0, indicating that this is the very beginning of the real-time stream.start+duration(the entire length of this response) is2.039875seconds, and the word “problem” ends at2.039875seconds (which matches thedurationvalue), indicating that the stream cuts off the word.confidencefor the word “big” has improved to almost 98%.- the
endtimestamp for “big” now indicates that the word has not been cut off. confidencefor the word “problem” is almost 100%, so can likely be trusted.is_finalisfalse, indicating that Deepgram will continue waiting to see if more data will improve its predictions.
For a more detailed example of using Interim results refer to Using Interim Results Tips & Tricks.