Multilingual Code-Switching

Transcribe audio containing or switching between multiple languages.

languagestring. Option:multi

The Multilingual Code-Switching feature in Deepgram's API enables the transcription of audio that contains or switches between multiple languages. This guide will walk you through enabling this feature, how to use it with cURL, and how to analyze and interpret the response.

📘

Multi-Lingual Code Switching is only available when using the Nova-2 model.

1. Enable Feature

To enable Multilingual Code-Switching, use the following language parameter in the query string when you call Deepgram’s /listen endpoint :

language=multi

Pre-Recorded Audio

To transcribe audio from a file on your computer that contains multiple languages, run the following cURL command in a terminal or your favorite API client.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?language=multi&model=nova-2

🚧

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Streaming Audio

To transcribe an audio stream, initiate a websocket connection, including the parameter language=multi. For instance:

📘

We recommend using an endpointing value of 100 ms for code-switching, endpointing=100.

wss://api.deepgram.com/v1/listen?language=multi&model=nova-2&sample_rate=44100&encoding=linear16&endpointing=100

3. Analyze Response

Pre-Recorded Audio

When the file is finished processing, you’ll receive a JSON response that has the following basic structure:


{
    "metadata": {
        "transaction_key": "deprecated",
        "request_id": "2479c8c8-8285-40ac-9ab6-f0874449f793",
        "sha256": "154e291ecfa8be6ab8343560bcc109001fa7853eb537253be8e4defc9b504c33",
        "created": "2024-06-26T19:56:16.180Z",
        "duration": 1.6,
        "channels": 1,
        "models": [
            "dc8a3fe5-a395-4b75-a8b1-71c9a5a87526"
        ],
        "model_info": {
            "dc8a3fe5-a395-4b75-a8b1-71c9a5a87526": {
                "name": "2-general-nova",
                "version": "1999-06-13.21385",
                "arch": "nova-2"
            }
        }
    },
    "results": {
        "channels": [
            {
                "alternatives": [
                    {
                        "transcript": "No recuerdo mi bank password.",
                        "confidence": 0.99902344,
                        "languages": [
                            "en",
                            "es"
                        ],
                        "words": [
                            {
                                "word": "no",
                                "start": 0.08,
                                "end": 0.32,
                                "confidence": 0.9975586,
                                "language": "es"
                            },
                            {
                                "word": "recuerdo",
                                "start": 0.32,
                                "end": 0.79999995,
                                "confidence": 0.9921875,
                                "language": "es"
                            },
                            {
                                "word": "mi",
                                "start": 0.79999995,
                                "end": 1.04,
                                "confidence": 0.96777344,
                                "language": "es"
                            },
                            {
                                "word": "bank",
                                "start": 1.04,
                                "end": 1.28,
                                "confidence": 1,
                                "language": "en"
                            },
                            {
                                "word": "password",
                                "start": 1.28,
                                "end": 1.5999999,
                                "confidence": 0.9926758,
                                "language": "en"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

In this response, we see that each channel contains:

  • alternatives object, which contains:
    • transcript: Transcript for the audio being processed.
    • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
    • languages: Array of BCP-47 language tags for all detected languages in the channel, sorted in descending order of number of words per language.
    • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio, a word-level transcription confidence value, the language of the word, and the punctuated word if Smart Formatting is enabled.

Streaming Audio

When streaming audio, a Results JSON message has the following structure:

{
    "type": "Results",
    "channel_index": [
        0,
        1
    ],
    "duration": 4.0700073,
    "start": 464.47,
    "is_final": True,
    "speech_final": False,
    "channel": {
        "alternatives": [
            {
                "transcript": "será el inglés muchos",
                "confidence": 0.937473,
                "languages": [
                    "es"
                ],
                "words": [
                    {
                        "word": "será",
                        "start": 465.43,
                        "end": 465.91,
                        "confidence": 0.9494371,
                        "language": "es"
                    },
                    {
                        "word": "el",
                        "start": 465.91,
                        "end": 466.15,
                        "confidence": 0.37035784,
                        "language": "es"
                    },
                    {
                        "word": "inglés",
                        "start": 466.15,
                        "end": 466.65,
                        "confidence": 0.416623,
                        "language": "es"
                    },
                    {
                        "word": "muchos",
                        "start": 467.75,
                        "end": 468.25,
                        "confidence": 0.937473,
                        "language": "es"
                    }
                ]
            }
        ]
    },
    "metadata": {
        "request_id": "84157495-6794-4c45-b12b-95b0aeb8793f",
        "model_info": {
            "name": "2-general-nova",
            "version": "1999-05-16.19331",
            "arch": "nova-2"
        },
        "model_uuid": "cf62fbcf-2ee4-49ff-a064-d92fc81d27f4"
    },
    "from_finalize": False
}