For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
      • Getting Started
      • Feature Overview
      • Live Streaming Starter Kit
      • Template Apps
        • Speech Started
        • Utterance End
        • Endpointing
        • Interim Results
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Enable Feature
  • Results
Streaming AudioTranscription (Nova-3)Speech Detection

Endpointing

Endpointing returns transcripts when pauses in speech are detected.
Was this page helpful?
Previous

Interim Results

Interim Results provides preliminary results for streaming audio.
Next
Built with

endpointing string.

Pre-recorded Streaming:Nova All available languages

Deepgram’s Endpointing feature can be used for speech detection by monitoring incoming streaming audio and relies on a Voice Activity Detector (VAD), which monitors the incoming audio and triggers when a sufficiently long pause is detected.

Endpointing helps to detects sufficiently long pauses that are likely to represent an endpoint in speech. When an endpoint is detected the model assumes that no additional data will improve it’s prediction of the endpoint.

The transcript results are then finalized for the process time range and the JSON response is returned with a speech_final parameter set to true.

You can customize the length of time used to detect whether a speaker has finished speaking by setting the endpointing parameter to an integer value.

Endpointing can be used with Deepgram’s Interim Results feature. To compare and contrast these features, and to explore best practices for using them together, see Using Endpointing and Interim Results with Live Streaming Audio.

Enable Feature

Endpointing is enabled by default and set to 10 milliseconds. and will return transcripts after detecting 10 milliseconds of silence.

The period of silence required for endpointing may also be configured. When you call Deepgram’s API, add an endpointing parameter set to an integer by setting endpointing to an integer representing a millisecond value:

endpointing=500

This will wait until 500 milliseconds of silence has passed to finalize and return transcripts.

Endpointing may be disabled by setting endpointing=false. If endpointing is disabled, transcriptions will be returned at a cadence determined by Deepgram’s chunking algorithms.

1# For more Python SDK migration guides, visit:
2# https://github.com/deepgram/deepgram-python-sdk/tree/main/docs
3
4 with client.listen.v1.connect(
5 model="nova-3",
6 language="en-US",
7 # Apply smart formatting to the output
8 smart_format=True,
9 # Raw audio format details
10 encoding="linear16",
11 channels=1,
12 sample_rate=16000,
13 # To get UtteranceEnd, the following must be set:
14 interim_results=True,
15 utterance_end_ms="1000",
16 vad_events=True,
17 # Time in milliseconds of silence to wait for before finalizing speech
18 endpointing=300
19 ) as connection:

Results

When enabled, the transcript for each received streaming response shows a key called speech_final.

JSON
1{
2 "channel_index":[
3 0,
4 1
5 ],
6 "duration":1.039875,
7 "start":0.0,
8 "is_final":false,
9 "speech_final":true,
10 "channel":{
11 "alternatives":[
12 {
13 "transcript":"another big",
14 "confidence":0.9600255,
15 "words":[
16 {
17 "word":"another",
18 "start":0.2971154,
19 "end":0.7971154,
20 "confidence":0.9588303
21 },
22 {
23 "word":"big",
24 "start":0.85173076,
25 "end":1.039875,
26 "confidence":0.9600255
27 }
28 ]
29 }
30 ]
31 }
32}
33...