Live Audio

Handshake

GET

Headers

AuthorizationstringRequired

API key for authentication. Format should be token <DEEPGRAM_API_KEY>

Query parameters

callbackstringRequired

URL to which we’ll make the callback request

callback_methodenumRequired

HTTP method by which the callback request will be made

Allowed values: POSTGETPUTDELETE
channelsstringRequired

The number of channels in the submitted audio

diarizeenumRequired

Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

Allowed values: truefalse
diarize_versionstringRequired

Version of the diarization feature to use. Only used when the diarization feature is enabled (diarize=true is passed to the API)

dictationenumRequired

Identify and extract key entities from content in submitted audio

Allowed values: truefalse
encodingenumRequired

Specify the expected encoding of your submitted audio

endpointingstringRequired

Indicates how long Deepgram will wait to detect whether a speaker has finished speaking or pauses for a significant period of time. When set to a value, the streaming endpoint immediately finalizes the transcription for the processed time range and returns the transcript with a speech_final parameter set to true. Can also be set to false to disable endpointing

extrastringRequired

Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

filler_wordsenumRequired

Filler Words can help transcribe interruptions in your audio, like “uh” and “um”

Allowed values: truefalse
interim_resultsenumRequired

Specifies whether the streaming endpoint should provide ongoing transcription updates as more audio is received. When set to true, the endpoint sends continuous updates, meaning transcription results may evolve over time

Allowed values: truefalse
keytermstringRequired

Key term prompting can boost or suppress specialized terminology and brands. Only compatible with Nova-3

keywordsstringRequired

Keywords can boost or suppress specialized terminology and brands

languageenumRequired

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model you choose only certain languages are available

modelenumRequired

AI model to use for the transcription

multichannelenumRequired

Transcribe each audio channel independently

Allowed values: truefalse
numeralsenumRequired

Convert numbers from written format to numerical format

Allowed values: truefalse
profanity_filterenumRequired

Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

Allowed values: truefalse
punctuateenumRequired

Add punctuation and capitalization to the transcript

Allowed values: truefalse
redactenumRequired

Redaction removes sensitive information from your transcripts

Allowed values: truefalse
replacestringRequired

Search for terms or phrases in submitted audio and replaces them

sample_ratestringRequired

Sample rate of submitted audio. Required (and only read) when a value is provided for encoding

searchstringRequired

Search for terms or phrases in submitted audio

smart_formatenumRequired

Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability

Allowed values: truefalse
tagstringRequired

Label your requests for the purpose of identification during usage reporting

utterance_endstringRequired

Indicates how long Deepgram will wait to send an UtteranceEnd message after a word has been transcribed. Use with interim_results

vad_eventsenumRequired

Indicates that speech has started. You’ll begin receiving Speech Started messages upon speech starting

Allowed values: truefalse
versionstringRequired

Version of an AI model to use

Send

abc
string

Raw audio data to be transcribed. Should be sent as a binary WebSocket message without base64 encoding

OR
Listen Control Messages Requestobject

Receive

Transcription Responseobject
OR
Control Message Responseobject
OR
Listen Close Frameobject

When Deepgram encounters an error during streaming speech to text, a WebSocket Close frame is sent. The frame contains a status code and UTF-8-encoded payload describing the error reason

Built with