Live Audio

Deepgram Speech to Text WebSocket

HandshakeTry it

GET
wss://api.deepgram.com/v1/listen

Headers

AuthorizationstringRequired

API key for authentication. Format should be be either ‘token <DEEPGRAM_API_KEY>’ or ‘Bearer <JWT_TOKEN>’

Query parameters

callbackstringOptional
URL to which we'll make the callback request
callback_methodenumOptionalDefaults to POST
HTTP method by which the callback request will be made
Allowed values:
channelsstringOptionalDefaults to 1
The number of channels in the submitted audio
diarizebooleanOptional

Defaults to false. Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

dictationenumOptionalDefaults to false
Identify and extract key entities from content in submitted audio
Allowed values:
encodingenumOptional
Specify the expected encoding of your submitted audio
endpointingstringOptionalDefaults to 10

Indicates how long Deepgram will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing

extrastringOptional

Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

filler_wordsenumOptionalDefaults to false
Filler Words can help transcribe interruptions in your audio, like "uh" and "um"
Allowed values:
interim_resultsenumOptionalDefaults to false
Specifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time
Allowed values:
keytermlist of stringsOptional

Key term prompting can boost or suppress specialized terminology and brands. Only compatible with Nova-3

keywordsstringOptional
Keywords can boost or suppress specialized terminology and brands
languageenumOptionalDefaults to en

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model you choose only certain languages are available

mip_opt_outstringOptionalDefaults to false

Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip

modelenumOptional
AI model to use for the transcription
multichannelenumOptionalDefaults to false
Transcribe each audio channel independently
Allowed values:
numeralsenumOptionalDefaults to false
Convert numbers from written format to numerical format
Allowed values:
profanity_filterenumOptionalDefaults to false

Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

Allowed values:
punctuateenumOptionalDefaults to false
Add punctuation and capitalization to the transcript
Allowed values:
redactenumOptionalDefaults to false
Redaction removes sensitive information from your transcripts
replacestringOptional
Search for terms or phrases in submitted audio and replaces them
sample_ratestringOptional

Sample rate of submitted audio. Required (and only read) when a value is provided for encoding

searchstringOptional
Search for terms or phrases in submitted audio
smart_formatenumOptionalDefaults to false
Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability
Allowed values:
tagstringOptional
Label your requests for the purpose of identification during usage reporting
utterance_endstringOptional

Indicates how long Deepgram will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results

vad_eventsenumOptionalDefaults to false
Indicates that speech has started. You'll begin receiving Speech Started messages upon speech starting
Allowed values:
versionstringOptionalDefaults to latest
Version of an AI model to use

Send

transcriptionRequeststringRequired
OR
listen_controlMessagesRequestobjectRequired

Receive

transcriptionResponseobjectRequired
OR
objectRequired
OR
objectRequired
OR
objectRequired
OR
listen_closeFrameobjectRequired