For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
      • Getting Started
      • Feature Overview
      • Live Streaming Starter Kit
      • Template Apps
        • Speech Started
        • Utterance End
        • Endpointing
        • Interim Results
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Enable Feature
  • Analyze Interim Transcripts
Streaming AudioTranscription (Nova-3)Speech Detection

Interim Results

Interim Results provides preliminary results for streaming audio.
Was this page helpful?
Previous

End of Speech Detection While Live Streaming

Learn how to use End of Speech when transcribing live streaming audio with Deepgram.
Next
Built with

interim_results boolean. Default: false

Pre-recorded Streaming:Nova All available languages

Deepgram’s Interim Results monitors streaming audio and provides interim transcripts, which are preliminary results provided during the real-time streaming process which can help with speech detection.

Deepgram will identify a point at which its transcript has reached maximum accuracy and send a definitive, or final, transcript of all audio up to that point. It will then continue to process audio.

When working with real-time streaming audio, streams flow from your capture source (for example, microphone, browser, or telephony system) to Deepgram’s servers in irregular pieces. In some cases the collected audio can end abruptly—perhaps even mid-word—which means that Deepgram’s predictions, particularly for words near the tip of the audio stream, are more likely to be wrong.

When Interim Results is enabled Deepgram guesses about the words being spoken and sends these guesses to you as interim transcripts. As more audio enters the server, Deepgram corrects and improves the transcriptions, increasing its accuracy, until it reaches the end of the stream, at which point it sends one last, cumulative transcript.

Interim Results can be used with Deepgram’s Endpointing feature. To compare and contrast these features, and to explore best practices for using them together, see Using Endpointing and Interim Results with Live Streaming Audio.

Enable Feature

To enable Interim Results, when you call Deepgram’s API, add an interim_results parameter set to true in the query string:

interim_results=true

1# see https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/async_microphone/main.py
2# for complete example code
3
4 # Create websocket connection with interim results enabled
5 with deepgram.listen.v1.connect(
6 model="nova-3",
7 language="en-US",
8 # Apply smart formatting to the output
9 smart_format=True,
10 # Raw audio format details
11 encoding="linear16",
12 channels=1,
13 sample_rate=16000,
14 # To get interim results, the following must be set:
15 interim_results=True,
16 utterance_end_ms="1000",
17 vad_events=True,
18 # Time in milliseconds of silence to wait for before finalizing speech
19 endpointing=300
20 ) as connection:

Analyze Interim Transcripts

Let’s look at some interim transcripts and analyze their content.

Our first interim result has the following content:

JSON
1{
2 "channel_index": [
3 0,
4 1
5 ],
6 "duration": 1.039875,
7 "start": 0,
8 "is_final": false,
9 "channel": {
10 "alternatives": [
11 {
12 "transcript": "another big",
13 "confidence": 0.9600255,
14 "words": [
15 {
16 "word": "another",
17 "start": 0.2971154,
18 "end": 0.7971154,
19 "confidence": 0.9588303
20 },
21 {
22 "word": "big",
23 "start": 0.85173076,
24 "end": 1.039875,
25 "confidence": 0.9600255
26 }
27 ]
28 }
29 ]
30 }
31}

In this response, we see that:

  • start (the number of seconds into the audio stream) is 0.0, indicating that this is the very beginning of the real-time stream.
  • start + duration (the entire length of this response) is 1.039875 seconds, and the word “big” ends at 1.039875 seconds (which matches the duration value), indicating that the stream cuts off the word.
  • confidence for the word “big” is approximately 96%, indicating that even though the word is cut off, Deepgram is still pretty certain that its prediction is correct.
  • is_final is false, indicating that Deepgram will continue waiting to see if more data will improve its predictions.

The next interim response has the following content:

JSON
1{
2 "channel_index": [
3 0,
4 1
5 ],
6 "duration": 2.039875,
7 "start": 0,
8 "is_final": false,
9 "channel": {
10 "alternatives": [
11 {
12 "transcript": "another big problem",
13 "confidence": 0.9939844,
14 "words": [
15 {
16 "word": "another",
17 "start": 0.29852942,
18 "end": 0.7985294,
19 "confidence": 0.9939844
20 },
21 {
22 "word": "big",
23 "start": 0.8557843,
24 "end": 1.3557843,
25 "confidence": 0.98220366
26 },
27 {
28 "word": "problem",
29 "start": 1.5722549,
30 "end": 2.039875,
31 "confidence": 0.9953441
32 }
33 ]
34 }
35 ]
36 }
37}

In this response, we see that:

  • start (the number of seconds into the audio stream) is 0, indicating that this is the very beginning of the real-time stream.
  • start + duration (the entire length of this response) is 2.039875 seconds, and the word “problem” ends at 2.039875 seconds (which matches the duration value), indicating that the stream cuts off the word.
  • confidence for the word “big” has improved to almost 98%.
  • the end timestamp for “big” now indicates that the word has not been cut off.
  • confidence for the word “problem” is almost 100%, so can likely be trusted.
  • is_final is false, indicating that Deepgram will continue waiting to see if more data will improve its predictions.

For a more detailed example of using Interim results refer to Using Interim Results Tips & Tricks.