For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • 1. Enable Feature
  • Nova-2 / Nova-3
  • Pre-Recorded Audio
  • Streaming Audio
  • Flux Multilingual
  • 3. Analyze Response
  • Pre-Recorded Audio
  • Streaming Audio
Models and Languages

Multilingual Codeswitching

Transcribe conversations where speakers switch between multiple languages.
Was this page helpful?
Previous

Model Options

Model options allows you to supply a model to use for speech-to-text.

Next
Built with

language string Option: multi

Pre-recorded Streaming:Nova Specific languages only

The Multilingual Codeswitching feature in Deepgram’s API allows you to transcribe conversations where speakers switch between multiple languages. This guide will walk you through enabling this feature, how to use it with cURL, and how to analyze and interpret the response.

Multilingual Code Switching is available on Nova-2, Nova-3, and Flux Multilingual (flux-general-multi). See the list of supported languages for each multilingual model.

1. Enable Feature

Nova-2 / Nova-3

To enable Multilingual Codeswitching on Nova-2 or Nova-3, use the following language parameter in the query string when you call Deepgram’s /listen endpoint:

language=multi

Pre-Recorded Audio

To transcribe audio from a file on your computer that contains multiple languages, run the following cURL command in a terminal or your favorite API client.

cURL
$curl \
> --request POST \
> --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
> --header 'Content-Type: audio/wav' \
> --data-binary @youraudio.wav \
> --url 'https://api.deepgram.com/v1/listen?language=multi&model=nova-3

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Streaming Audio

To transcribe an audio stream, initiate a websocket connection, including the parameter language=multi. For instance:

We recommend using an endpointing value of 100 ms for code-switching, endpointing=100.

wss://api.deepgram.com/v1/listen?language=multi&model=nova-3&sample_rate=44100&encoding=linear16&endpointing=100

Flux Multilingual

Flux Multilingual handles code-switching natively. Instead of language=multi, use model=flux-general-multi with optional language_hint parameters to bias toward expected languages.

wss://api.deepgram.com/v2/listen?model=flux-general-multi&language_hint=en&language_hint=es&encoding=linear16&sample_rate=16000

Flux returns detected languages in every TurnInfo event via the languages field. See the Language Prompting guide for full details on usage scenarios and response format.

3. Analyze Response

Pre-Recorded Audio

When the file is finished processing, you’ll receive a JSON response that has the following basic structure:

JSON
1{
2 "metadata": {
3 "transaction_key": "deprecated",
4 "request_id": "2479c8c8-8285-40ac-9ab6-f0874449f793",
5 "sha256": "154e291ecfa8be6ab8343560bcc109001fa7853eb537253be8e4defc9b504c33",
6 "created": "2024-06-26T19:56:16.180Z",
7 "duration": 1.6,
8 "channels": 1,
9 "models": [
10 "dc8a3fe5-a395-4b75-a8b1-71c9a5a87526"
11 ],
12 "model_info": {
13 "dc8a3fe5-a395-4b75-a8b1-71c9a5a87526": {
14 "name": "2-general-nova",
15 "version": "1999-06-13.21385",
16 "arch": "nova-2"
17 }
18 }
19 },
20 "results": {
21 "channels": [
22 {
23 "alternatives": [
24 {
25 "transcript": "No recuerdo mi bank password.",
26 "confidence": 0.99902344,
27 "languages": [
28 "en",
29 "es"
30 ],
31 "words": [
32 {
33 "word": "no",
34 "start": 0.08,
35 "end": 0.32,
36 "confidence": 0.9975586,
37 "language": "es"
38 },
39 {
40 "word": "recuerdo",
41 "start": 0.32,
42 "end": 0.79999995,
43 "confidence": 0.9921875,
44 "language": "es"
45 },
46 {
47 "word": "mi",
48 "start": 0.79999995,
49 "end": 1.04,
50 "confidence": 0.96777344,
51 "language": "es"
52 },
53 {
54 "word": "bank",
55 "start": 1.04,
56 "end": 1.28,
57 "confidence": 1,
58 "language": "en"
59 },
60 {
61 "word": "password",
62 "start": 1.28,
63 "end": 1.5999999,
64 "confidence": 0.9926758,
65 "language": "en"
66 }
67 ]
68 }
69 ]
70 }
71 ]
72 }
73}

In this response, we see that each channel contains:

  • alternatives object, which contains:

    • transcript: Transcript for the audio being processed.
    • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
    • languages: Array of BCP-47 language tags for all detected languages in the channel, sorted in descending order of number of words per language.
    • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio, a word-level transcription confidence value, the language of the word, and the punctuated word if Smart Formatting is enabled.

Streaming Audio

When streaming audio, a Results JSON message has the following structure:

JSON
1{
2 "type": "Results",
3 "channel_index": [
4 0,
5 1
6 ],
7 "duration": 4.0700073,
8 "start": 464.47,
9 "is_final": True,
10 "speech_final": False,
11 "channel": {
12 "alternatives": [
13 {
14 "transcript": "será el inglés muchos",
15 "confidence": 0.937473,
16 "languages": [
17 "es"
18 ],
19 "words": [
20 {
21 "word": "será",
22 "start": 465.43,
23 "end": 465.91,
24 "confidence": 0.9494371,
25 "language": "es"
26 },
27 {
28 "word": "el",
29 "start": 465.91,
30 "end": 466.15,
31 "confidence": 0.37035784,
32 "language": "es"
33 },
34 {
35 "word": "inglés",
36 "start": 466.15,
37 "end": 466.65,
38 "confidence": 0.416623,
39 "language": "es"
40 },
41 {
42 "word": "muchos",
43 "start": 467.75,
44 "end": 468.25,
45 "confidence": 0.937473,
46 "language": "es"
47 }
48 ]
49 }
50 ]
51 },
52 "metadata": {
53 "request_id": "84157495-6794-4c45-b12b-95b0aeb8793f",
54 "model_info": {
55 "name": "2-general-nova",
56 "version": "1999-05-16.19331",
57 "arch": "nova-2"
58 },
59 "model_uuid": "cf62fbcf-2ee4-49ff-a064-d92fc81d27f4"
60 },
61 "from_finalize": False
62}