Language Detection

Language Detection identifies the dominant language spoken in submitted audio.

detect_language boolean. Default: false

Pre-recorded Streaming

Deepgram’s Language Detection feature identifies the dominant language spoken in submitted audio, transcribes the audio in the identified language, and returns the detected language code in the JSON response.

If you are submitting multichannel audio, Language Detection identifies one language per channel. Language Detection is supported for the following languages:

  • Spanish - es
  • English - en
  • Hindi - hi
  • Japanese - ja
  • Russian - ru
  • Ukrainian - uk
  • Swedish - sv
  • Chinese - zh
  • Portuguese - pt
  • Dutch - nl
  • Turkish - tr
  • French - fr
  • German - de
  • Indonesian - id
  • Korean - ko
  • Italian - it

Enable Feature

To enable Language Detection, when you call Deepgram’s API, add a detect_language parameter set to true in the query string:

detect_language=true

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?model=nova-2-general&detect_language=true'

🚧

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
        "detected_language": "fr",
        "language_confidence": 0.0
      }
    ]
  }

In this response, we see that each channel contains:

  • alternatives object, which contains:
    • transcript: Transcript for the audio being processed.
    • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
    • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, a word confidence value, a speaker identifier, and a speaker confidence value.
  • detected_language: BCP-47 language tag for the dominant language identified in the channel.
  • language_confidence: Floating point value between 0 and 1 that indicates the confidence of the language selection (see below for important details). language_confidence is not supported for Whisper models and will not be included in the API response for Whisper requests.

Advanced Functionality

Model Selection

If you specify both detect_language=true and a model in your query parameter, Deepgram will attempt to use the specified model for the language that is detected. However, if the detected language is not available for that model, Deepgram will automatically select the next highest model to complete the request.

To use the best Deepgram model available, use model=nova-2-general&detect_language=true. The order of preference will be Nova-2 -> Nova-1 -> Enhanced -> Base.

For example, you may send an ASR request with the parameters detect_language=true&model=nova-2-general. If the detected language is supported by Base and Enhanced models, but not a Nova-2 model, Deepgram will process the request with the Enhanced model since that is the next highest model available for that language.

Interaction with language query parameter

If the language query parameter is set and detect_language=true is also present, the language detected by Deepgram will be used for inference instead of the language specified in the language parameter.

Restricting the detectable languages

You can restrict the possible set of detectable languages. This is useful if, for example, you know your audio files only contain English or Spanish audio. To restrict the possible set of detectable languages, use a multivalued query parameter with the language codes as the values. For example, detect_language=en&detect_language=es will choose either English or Spanish as the detected language.

How to use language_confidence

⚠️

language_confidence is not supported when using Whisper models.

Deepgram outputs a language_confidence score that ranges between 0 and 1 with higher values indicating more confidence in the selected language.

The language_confidence score can be used as a metric to determine whether the transcript is accurate. For example, if the language_confidence falls below a certain threshold, you may want to default to another language or reject the transcript.

It is critical to know that the language_confidence score only takes into account the 16 supported languages. If the audio is in a language not supported by language detection, the value of language_confidence should be ignored.


What’s Next