Language Detection

Learn about Deepgram's Language Detection feature, which identifies the dominant language spoken in submitted audio.

Pre-recorded Streaming

Deepgram’s Language Detection feature identifies the dominant language spoken in submitted audio, transcribes the audio in the identified language, and returns the detected language code in the JSON response.

Language Detection supports all generally available languages. If you are submitting multichannel audio, Language Detection identifies one language per channel.

When Language Detection is enabled, the Punctuation feature will be enabled by default.

Use Cases

An example of a use case for Language Detection includes:

Customers with input audio files in different languages who need to automatically detect the dominant language in the audio data and transcribe the output in the detected language.

Enable Feature

To enable Language Detection, when you call Deepgram’s API, add a detect_language parameter set to true in the query string:

⚠️

When Language Detection is enabled, the Punctuation feature will be enabled by default.

detect_language=true&punctuate=true

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

ℹ️

Be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?detect_language=true&punctuate=true'

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
        "detected_language": "fr"
      }
    ]
  }

In this response, we see that each channel contains:

  • alternatives object, which contains:
    • transcript: Transcript for the audio being processed.
    • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
    • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, a word confidence value, a speaker identifier, and a speaker confidence value.
  • detected_language: BCP-47 language tag for the dominant language identified in the channel.