Live Audio

Transcribe audio and video using Deepgram’s speech-to-text WebSocket

HandshakeTry it

WSS
/v1/listen

Headers

AuthorizationstringRequired

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

Example: token %DEEPGRAM_API_KEY% or bearer %DEEPGRAM_TOKEN%

Query parameters

callbackanyOptional
URL to which we'll make the callback request
callback_methodenumOptionalDefaults to POST
HTTP method by which the callback request will be made
Allowed values:
channelsanyOptional
The number of channels in the submitted audio
diarizeenumOptionalDefaults to false

Defaults to false. Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

Allowed values:
dictationenumOptionalDefaults to false
Identify and extract key entities from content in submitted audio
Allowed values:
encodingenumOptional
Specify the expected encoding of your submitted audio
endpointinganyOptional

Indicates how long Deepgram will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing

extraanyOptional

Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

filler_wordsenumOptionalDefaults to false
Filler Words can help transcribe interruptions in your audio, like "uh" and "um"
Allowed values:
interim_resultsenumOptionalDefaults to false
Specifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time
Allowed values:
keytermanyOptional

Key term prompting can boost specialized terminology and brands. Only compatible with Nova-3

keywordsanyOptional
Keywords can boost or suppress specialized terminology and brands
languageenumOptionalDefaults to en
The [BCP-47 language tag](https://tools.ietf.org/html/bcp47) that hints at the primary spoken language. Depending on the Model you choose only certain languages are available
mip_opt_outanyOptional
Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip
modelenumRequired
AI model to use for the transcription
multichannelenumOptionalDefaults to false
Transcribe each audio channel independently
Allowed values:
numeralsenumOptionalDefaults to false
Convert numbers from written format to numerical format
Allowed values:
profanity_filterenumOptionalDefaults to false

Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

Allowed values:
punctuateenumOptionalDefaults to false
Add punctuation and capitalization to the transcript
Allowed values:
redactenumOptionalDefaults to false
Redaction removes sensitive information from your transcripts
replaceanyOptional
Search for terms or phrases in submitted audio and replaces them
sample_rateanyOptional

Sample rate of submitted audio. Required (and only read) when a value is provided for encoding

searchanyOptional
Search for terms or phrases in submitted audio
smart_formatenumOptionalDefaults to false
Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability
Allowed values:
taganyOptional
Label your requests for the purpose of identification during usage reporting
utterance_end_msanyOptional

Indicates how long Deepgram will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results

vad_eventsenumOptionalDefaults to false
Indicates that speech has started. You'll begin receiving Speech Started messages upon speech starting
Allowed values:
versionanyOptional
Version of an AI model to use

Send

ListenV1MediaanyRequired
Send audio or video data to be transcribed
OR
ListenV1FinalizeanyRequired
Send a Finalize message to flush the WebSocket stream
OR
ListenV1CloseStreamanyRequired
Send a CloseStream message to close the WebSocket stream
OR
ListenV1KeepAliveanyRequired
Send a KeepAlive message to keep the WebSocket stream alive

Receive

ListenV1ResultsanyRequired
Receive transcription results
OR
ListenV1MetadataanyRequired
Receive metadata about the transcription
OR
ListenV1UtteranceEndanyRequired
Receive an utterance end event
OR
ListenV1SpeechStartedanyRequired
Receive a speech started event