For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
      • Getting Started
      • Feature Overview
      • Live Streaming Starter Kit
      • Template Apps
        • End of Speech Detection While Live Streaming
        • Using Interim Results
        • Endpointing & Interim Results With Live Streaming
        • Determining Your Audio Format for Live Streaming Audio
        • Measuring Streaming Latency
        • STT Troubleshooting WebSocket, NET, and DATA Errors
        • Recovering From Connection Errors & Timeouts When Live Streaming
        • Using Lower-Level Websockets with the Streaming API
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Running The Example
  • Tips for working with transcripts
  • Identify Completed Audio Processing
  • Frequently Asked Questions
  • How do I measure latency with interim results?
  • How do I measure word error rates (WER) with interim results?
Streaming AudioTranscription (Nova-3)Tips and Tricks

Using Interim Results

Learn how Interim Results can be useful for streaming audio.
Was this page helpful?
Previous

Configure Endpointing and Interim Results

Control when transcripts are returned during live streaming audio.
Next
Built with

Deepgram’s Interim Results monitors streaming audio and provides interim transcripts, which are preliminary results provided during the real-time streaming process which can help with speech detection.

Below you will learn more about how to use interim results.

for information refer to the Interim Results feature page.

Running The Example

Download our final Python example script and run the example code:

SHELL
1python3 show-final.py -k 'YOUR_DEEPGRAM_API_KEY' /PATH/TO/AUDIO.wav

After execution, the script prints out the transcript for each response it receives and shows the is_final status for each message:

JSON
1Channels = 2, Sample Rate = 48000 Hz, Sample width = 2 bytes, Size = 18540124 bytes
2 1 0.000-1.100 ["is_final": false] another big
3 2 0.000-2.100 ["is_final": false] another big problem
4 3 0.000-3.100 ["is_final": false] another big problem in the speech analyst
5 4 0.000-4.100 ["is_final": false] another big problem in the speech analytics space
6 5 0.000-5.100 ["is_final": false] another big problem in the speech analytics space when custom
7 6 0.000-6.100 ["is_final": false] another big problem in the speech analytics space when customers first bring the
8 7 0.000-7.100 ["is_final": false] another big problem in the speech analytics space when customers first bring the software were on
9 8 0.000-8.100 ["is_final": false] another big problem in the speech analytics space when customers first bring the software were on is that they
10 9 0.000-9.100 ["is_final": false] another big problem in the speech analytics space when customers first bring the software on is that they they
11 10 0.000-8.490 ["is_final": true ] another big problem in the speech analytics space when customers first bring the software were on is that they
12 11 8.490-10.100 ["is_final": false] they are
13 12 8.490-11.100 ["is_final": false] they are blown away by the
14 ...

In this response, we see that:

  • On lines 1 through 9, the transcripts contain "is_final": false, indicating that they are interim transcripts. As more data passes to Deepgram, you see the transcripts is getting longer.
  • Between lines 3 and 4, Deepgram corrects its prediction of the word “analyst,” turning it into “analytics”. This is an example of interim results in action.
  • Between lines 5 and 6, Deepgram corrects its prediction of the word “custom”, turning it into “customer”. Another example of interim results in action.
  • On line 10, is_final is set to true, indicating that Deepgram will not return any additional transcripts covering that span of time (from 0.000 to 8.490 seconds) because it believes it has reached optimal accuracy for this section of the transcript.
  • On line 9, the transcript covers a span of time from 0.000 to 9.100 seconds, which is longer than the completed transcript issued on line 10. If you listen to this moment in the example audio, you will hear the speaker repeat the word “they”. After processing the repeated word, Deepgram decided it had reached optimal accuracy for the first section of the transcript, and split the transcript between the repeated words. Notice one “they” stayed with the first section (line 10), but the other “they” moved into the next section (line 11), which starts at 8.490 seconds.

Tips for working with transcripts

When handling real-time streaming results, the most accurate transcripts are available in the final transcripts, but the final transcripts may split the message.

  • If you need the best transcript possible and can tolerate some delay, rely on final transcripts; they are most accurate and aren’t likely to change.

  • If you need the fastest transcript possible, ignore final transcripts; instead, track timings and confidences to determine whether to keep waiting before committing to the current interim transcript. This usually works well because most content does not change between consecutive interim transcripts.

Identify Completed Audio Processing

To identify whether the audio stream is completely processed, send an empty binary WebSocket message to the Deepgram server and then continue to process server responses until the server gracefully closes the connection.

Frequently Asked Questions

How do I measure latency with interim results?

In general terms, real-time streaming latency is the time delay between when a transfer of data begins and when a system begins processing it. In mathematical terms, it is the difference between the audio cursor (the number of seconds of audio you have currently submitted; we’ll call this X) and the latest transcript cursor (start + duration; we’ll call this Y). Latency is X-Y.

However, remember that to give you best accuracy, final transcripts may end early (see lines 9 and 10 in the example above), which means you’ve already received more data than what is reflected in the final transcript.

The final transcripts are meant for situations where you need the highest confidence levels, whereas the latest interim transcript has the lowest latency. It’s recommended to always ignore final transcripts when calculating latency.

To learn more, see Measuring Streaming Latency.

How do I measure word error rates (WER) with interim results?

To calculate WER, concatenate all final transcripts and compare to your base transcript. Because final transcripts are the most accurate, they should be preferred over interim transcripts, which prioritize speed over accuracy. And because a single final transcript does not guarantee that the audio stream is complete, you will need to be certain you have collected all final transcripts before performing your calculation.

Let’s look at an example. Download our WER Python example script, prepare an audio file (or use our sample WAV file), and run the example code:

SHELL
1python3 concat-final.py -k 'YOUR_DEEPGRAM_API_KEY' /PATH/TO/audio.wav

When run, the script concatenates the final transcripts returned by Deepgram and prints the result:

JSON
1Channels = 2, Sample Rate = 48000 Hz, Sample width = 2 bytes, Size = 18540124 bytes
2another big problem in the speech analytics space when customers first bring the software where is that they they are blown away...

You can compare this result with your base transcript to calculate WER.