Pre-Recorded Audio

Transcribe audio using Deepgram’s speech-to-text REST API

Headers

AuthorizationstringRequired

Query parameters

callbackstringOptional
URL to which we'll make the callback request
callback_methodenumOptionalDefaults to POST
HTTP method by which the callback request will be made
Allowed values:
custom_topicstring or list of stringsOptional
Custom topics you want the model to detect within your input audio or text if present Submit up to 100
custom_topic_modeenumOptionalDefaults to extended

Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param

Allowed values:
custom_intentstring or list of stringsOptional
Custom intents you want the model to detect within your input audio if present
custom_intent_modeenumOptionalDefaults to extended

Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in addition those submitted using the custom_intents param

Allowed values:
detect_entitiesbooleanOptionalDefaults to false
Identifies and extracts key entities from content in submitted audio
detect_languageboolean or list of stringsOptional
Identifies the dominant language spoken in submitted audio
diarizebooleanOptionalDefaults to false
Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0
dictationbooleanOptionalDefaults to false
Identify and extract key entities from content in submitted audio
encodingenumOptional
Specify the expected encoding of your submitted audio
extrastring or list of stringsOptional

Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

filler_wordsbooleanOptionalDefaults to false
Filler Words can help transcribe interruptions in your audio, like "uh" and "um"
intentsbooleanOptionalDefaults to false
Recognizes speaker intent throughout a transcript or text
keytermlist of stringsOptional

Key term prompting can boost or suppress specialized terminology and brands. Only compatible with Nova-3

keywordsstring or list of stringsOptional
Keywords can boost or suppress specialized terminology and brands
languageenumOptionalDefaults to en

The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available

measurementsbooleanOptionalDefaults to false
Spoken measurements will be converted to their corresponding abbreviations
mip_opt_outbooleanOptionalDefaults to false

Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip

modelenum or stringOptional
AI model used to process submitted audio
multichannelbooleanOptionalDefaults to false
Transcribe each audio channel independently
numeralsbooleanOptionalDefaults to false
Numerals converts numbers from written format to numerical format
paragraphsbooleanOptionalDefaults to false
Splits audio into paragraphs to improve transcript readability
profanity_filterbooleanOptionalDefaults to false

Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

punctuatebooleanOptionalDefaults to false
Add punctuation and capitalization to the transcript
redactstring or list of enumsOptional
Redaction removes sensitive information from your transcripts
replacestring or list of stringsOptional
Search for terms or phrases in submitted audio and replaces them
searchstring or list of stringsOptional
Search for terms or phrases in submitted audio
sentimentbooleanOptionalDefaults to false
Recognizes the sentiment throughout a transcript or text
smart_formatbooleanOptionalDefaults to false
Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability
summarizeenum or booleanOptional
Summarize content. For Listen API, supports string version option. For Read API, accepts boolean only.
tagstring or list of stringsOptional
Label your requests for the purpose of identification during usage reporting
topicsbooleanOptionalDefaults to false
Detect topics throughout a transcript or text
utterancesbooleanOptionalDefaults to false
Segments speech into meaningful semantic units
utt_splitdoubleOptionalDefaults to 0.8
Seconds to wait before detecting a pause between words in submitted audio
versionenum or stringOptional
Version of an AI model to use

Request

Transcribe an audio file
objectRequired
OR
stringRequiredformat: "binary"

Response

Successful transcription
metadataobject
resultsobject

Errors