Live Audio
Handshake
Headers
API key for authentication. Format should be token <DEEPGRAM_API_KEY>
Query parameters
URL to which we’ll make the callback request
HTTP method by which the callback request will be made
The number of channels in the submitted audio
Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0
Version of the diarization feature to use. Only used when the diarization feature is enabled (diarize=true
is passed to the API)
Identify and extract key entities from content in submitted audio
Specify the expected encoding of your submitted audio
Indicates how long Deepgram will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing
Arbitrary key-value pairs that are attached to the API response for usage in downstream processing
Filler Words can help transcribe interruptions in your audio, like “uh” and “um”
Specifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time
Key term prompting can boost or suppress specialized terminology and brands. Only compatible with Nova-3
Keywords can boost or suppress specialized terminology and brands
The BCP-47 language tag that hints at the primary spoken language. Depending on the Model you choose only certain languages are available
AI model to use for the transcription
Transcribe each audio channel independently
Convert numbers from written format to numerical format
Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely
Add punctuation and capitalization to the transcript
Redaction removes sensitive information from your transcripts
Search for terms or phrases in submitted audio and replaces them
Sample rate of submitted audio. Required (and only read) when a value is provided for encoding
Search for terms or phrases in submitted audio
Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability
Label your requests for the purpose of identification during usage reporting
Indicates how long Deepgram will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results
Indicates that speech has started. You’ll begin receiving Speech Started messages upon speech starting
Version of an AI model to use
Send
Raw audio data to be transcribed. Should be sent as a binary WebSocket message without base64 encoding
Receive
When Deepgram encounters an error during streaming speech to text, a WebSocket Close frame is sent. The frame contains a status code and UTF-8-encoded payload describing the error reason