Live Audio | Deepgram's Docs

Transcribe audio and video using Deepgram’s speech-to-text WebSocket

HandshakeTry it

WSS

/v1/listen

Headers

AuthorizationstringRequired

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

Example: token %DEEPGRAM_API_KEY% or bearer %DEEPGRAM_TOKEN%

Query parameters

callbackanyOptional

URL to which we'll make the callback request

callback_methodenumOptionalDefaults to POST

HTTP method by which the callback request will be made

Allowed values:

channelsanyOptional

The number of channels in the submitted audio

diarizeenumOptionalDefaults to false

Defaults to false. Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

Allowed values:

dictationenumOptionalDefaults to false

Identify and extract key entities from content in submitted audio

Allowed values:

encodingenumOptional

Specify the expected encoding of your submitted audio

endpointinganyOptional

Indicates how long Deepgram will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing

extraanyOptional

Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

interim_resultsenumOptionalDefaults to false

Specifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time

Allowed values:

keytermanyOptional

Key term prompting can boost specialized terminology and brands. Only compatible with Nova-3

keywordsanyOptional

Keywords can boost or suppress specialized terminology and brands

languageanyOptional

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model you choose only certain languages are available

The [BCP-47 language tag](https://tools.ietf.org/html/bcp47) that hints at the primary spoken language. Depending on the Model you choose only certain languages are available

mip_opt_outanyOptional

Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip

modelenumRequired

AI model to use for the transcription

multichannelenumOptionalDefaults to false

Transcribe each audio channel independently

Allowed values:

numeralsenumOptionalDefaults to false

Convert numbers from written format to numerical format

Allowed values:

profanity_filterenumOptionalDefaults to false

Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

Allowed values:

punctuateenumOptionalDefaults to false

Add punctuation and capitalization to the transcript

Allowed values:

redactenumOptionalDefaults to false

Redaction removes sensitive information from your transcripts

replaceanyOptional

Search for terms or phrases in submitted audio and replaces them

sample_rateanyOptional

Sample rate of submitted audio. Required (and only read) when a value is provided for encoding

searchanyOptional

Search for terms or phrases in submitted audio

smart_formatenumOptionalDefaults to false

Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability

Allowed values:

taganyOptional

Label your requests for the purpose of identification during usage reporting

utterance_end_msanyOptional

Indicates how long Deepgram will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results

vad_eventsenumOptionalDefaults to false

Indicates that speech has started. You'll begin receiving Speech Started messages upon speech starting

Allowed values:

versionanyOptional

Version of an AI model to use

Send

ListenV1MediastringRequiredformat: "binary"

Send audio or video data to be transcribed

ListenV1FinalizeobjectRequired

Send a Finalize message to flush the WebSocket stream

ListenV1CloseStreamobjectRequired

Send a CloseStream message to close the WebSocket stream

ListenV1KeepAliveobjectRequired

Send a KeepAlive message to keep the WebSocket stream alive

Receive

ListenV1ResultsobjectRequired

Receive transcription results

ListenV1MetadataobjectRequired

Receive metadata about the transcription

ListenV1UtteranceEndobjectRequired

Receive an utterance end event

ListenV1SpeechStartedobjectRequired

Receive a speech started event

URL	wss://api.deepgram.com/v1/listen
Method	GET
Status	101 Switching Protocols