Language Detection
Language Detection identifies the dominant language spoken in submitted audio.
detect_language
boolean Default: false
Deepgram’s Language Detection feature identifies the dominant language spoken in submitted audio, transcribes the audio in the identified language, and returns the detected language code in the JSON response.
If you are submitting multichannel audio, Language Detection identifies one language per channel. Language Detection is supported for the following languages:
- Spanish -
es
- English -
en
- Hindi -
hi
- Japanese -
ja
- Russian -
ru
- Ukrainian -
uk
- Swedish -
sv
- Chinese -
zh
- Portuguese -
pt
- Dutch -
nl
- Turkish -
tr
- French -
fr
- German -
de
- Indonesian -
id
- Korean -
ko
- Italian -
it
Enable Feature
To enable Language Detection, when you call Deepgram’s API, add a detect_language
parameter set to true
in the query string:
detect_language=true
To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.
curl \
--request POST \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'Content-Type: audio/wav' \
--data-binary @youraudio.wav \
--url 'https://api.deepgram.com/v1/listen?model=nova-2-general&detect_language=true'
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Analyze Response
When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:
{
"metadata": {
"transaction_key": "string",
"request_id": "string",
"sha256": "string",
"created": "string",
"duration": 0,
"channels": 0
},
"results": {
"channels": [
{
"alternatives":[],
"detected_language": "fr",
"language_confidence": 0.0
}
]
}
In this response, we see that each channel contains:
alternatives
object, which contains:transcript
: Transcript for the audio being processed.confidence
: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.words
: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, a word confidence value, a speaker identifier, and a speaker confidence value.
detected_language
: BCP-47 language tag for the dominant language identified in the channel.language_confidence
: Floating point value between 0 and 1 that indicates the confidence of the language selection (see below for important details).language_confidence
is not supported for Whisper models and will not be included in the API response for Whisper requests.
Advanced Functionality
Model Selection
If you specify both detect_language=true
and a model
in your query parameter, Deepgram will attempt to use the specified model for the language that is detected. However, if the detected language is not available for that model, Deepgram will automatically select the next highest model to complete the request.
To use the best Deepgram model available, use model=nova-2-general&detect_language=true
. The order of preference will be Nova-2 -> Nova-1 -> Enhanced -> Base.
For example, you may send an ASR request with the parameters detect_language=true&model=nova-2-general
. If the detected language is supported by Base and Enhanced models, but not a Nova-2 model, Deepgram will process the request with the Enhanced model since that is the next highest model available for that language.
Interaction with language
query parameter
language
query parameterIf the language
query parameter is set and detect_language=true
is also present, the language detected by Deepgram will be used for inference instead of the language specified in the language
parameter.
Restricting the detectable languages
You can restrict the possible set of detectable languages. This is useful if, for example, you know your audio files only contain English or Spanish audio. To restrict the possible set of detectable languages, use a multivalued query parameter with the language codes as the values. For example, detect_language=en&detect_language=es
will choose either English or Spanish as the detected language.
How to use language_confidence
language_confidence
language_confidence
is not supported when using Whisper models.
Deepgram outputs a language_confidence
score that ranges between 0 and 1 with higher values indicating more confidence in the selected language.
The language_confidence
score can be used as a metric to determine whether the transcript is accurate. For example, if the language_confidence
falls below a certain threshold, you may want to default to another language or reject the transcript.
It is critical to know that the language_confidence
score only takes into account the 16 supported languages. If the audio is in a language not supported by language detection, the value of language_confidence
should be ignored.
Updated about 2 months ago