Live Audio | Deepgram's Docs

Transcribe audio using Deepgram’s STT WebSocket

HandshakeTry it

WSS

wss://api.deepgram.com/v1/listen

Headers

AuthorizationstringRequired

API key for authentication. Format should be be either ‘token <DEEPGRAM_API_KEY>’ or ‘Bearer <JWT_TOKEN>’

Query parameters

callbackstringOptional

URL to which we'll make the callback request

callback_methodenumOptionalDefaults to POST

HTTP method by which the callback request will be made

Allowed values:

channelsstringOptionalDefaults to 1

The number of channels in the submitted audio

diarizebooleanOptional

Defaults to false. Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

dictationenumOptionalDefaults to false

Identify and extract key entities from content in submitted audio

Allowed values:

encodingenumOptional

Specify the expected encoding of your submitted audio

endpointingstringOptionalDefaults to 10

Indicates how long Deepgram will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing

extrastringOptional

Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

filler_wordsenumOptionalDefaults to false

Filler Words can help transcribe interruptions in your audio, like "uh" and "um"

Allowed values:

interim_resultsenumOptionalDefaults to false

Specifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time

Allowed values:

keytermlist of stringsOptional

Key term prompting can boost specialized terminology and brands. Only compatible with Nova-3

keywordsstringOptional

Keywords can boost or suppress specialized terminology and brands

languageenumOptionalDefaults to en

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model you choose only certain languages are available

mip_opt_outstringOptionalDefaults to false

Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip

modelenumOptional

AI model to use for the transcription

multichannelenumOptionalDefaults to false

Transcribe each audio channel independently

Allowed values:

numeralsenumOptionalDefaults to false

Convert numbers from written format to numerical format

Allowed values:

profanity_filterenumOptionalDefaults to false

Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

Allowed values:

punctuateenumOptionalDefaults to false

Add punctuation and capitalization to the transcript

Allowed values:

redactenumOptionalDefaults to false

Redaction removes sensitive information from your transcripts

replacestringOptional

Search for terms or phrases in submitted audio and replaces them

sample_ratestringOptional

Sample rate of submitted audio. Required (and only read) when a value is provided for encoding

searchstringOptional

Search for terms or phrases in submitted audio

smart_formatenumOptionalDefaults to false

Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability

Allowed values:

tagstringOptional

Label your requests for the purpose of identification during usage reporting

utterance_end_msstringOptional

Indicates how long Deepgram will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results

vad_eventsenumOptionalDefaults to false

Indicates that speech has started. You'll begin receiving Speech Started messages upon speech starting

Allowed values:

versionstringOptionalDefaults to latest

Version of an AI model to use

Send

sendAudiostringRequiredformat: "binary"

Send audio to Deepgram's Speech to Text API

ListenControlMessagesRequest0objectRequired

ListenControlMessagesRequest1objectRequired

ListenControlMessagesRequest2objectRequired

Receive

receiveTranscriptionobjectRequired

Receive transcription from Deepgram's Speech to Text API

FinalizeResponseobjectRequired

The server will process all remaining audio data and return the final results. You may receive a response with the from_finalize attribute set to true, indicating that the finalization process is complete. This response typically occurs when there is a noticeable amount of audio buffered in the server.

MetadataResponseobjectRequired

Provides real-time metadata during audio streaming, including audio characteristics and processing details. This response is sent periodically during streaming to provide updates about the audio being processed.

CloseStreamResponseobjectRequired

Indicates that the server has closed the WebSocket connection and the server will process all remaining audio data.

receiveListenCloseFrameobjectRequired

Receive close frame from Deepgram's Speech to Text API

URL	wss://api.deepgram.com/v1/listen
Method	GET
Status	101 Switching Protocols