For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Models
  • Example
  • Flux
  • Nova-3
  • Nova-2
  • Legacy Models
  • Nova
  • Enhanced
  • Base
  • Deepgram Whisper Cloud
Models and Languages

Models & Languages Overview

An overview of Deepgram’s speech-to-text models and supported languages.

Was this page helpful?
Previous

Languages Support

An overview of Deepgram’s speech-to-text supported languages.

Next
Built with

Models

General ModelsDescription & Use
FluxOur latest-generation streaming model unifying best-in-class ASR with model-native turn detection. Recommended for real-time agents, customer support bots, and interactive, turn-based experiences.
nova-3Our highest-performing general-purpose ASR (no turn detection). Recommended for meetings, event captioning, multi-speaker, multilingual, noisy, or far-field audio in batch or streaming.
nova-2Recommended for use cases with languages not yet supported by nova-3, and filler word identification.

All models default to language=en unless otherwise specified via the language parameter.

Example

To request any Deepgram Model, change MODEL_OPTION to the Model you want to use.

cURL
1curl \ --request POST \
2--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
3--header 'Content-Type: audio/wav' \
4--data-binary @youraudio.wav \
5--url 'https://api.deepgram.com/v1/listen?model=MODEL_OPTION'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Flux

Flux is the first conversational speech recognition model built specifically for voice agents. Unlike traditional STT that passively transcribed what is said, Flux understands conversational flow and automatically handles turn-taking.

Flux tackles the most critical challenges for voice agents today: knowing when to listen, when to think, and when to speak. The model features first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines, all with Nova-3 level accuracy.

Model OptionLanguage
flux-general-enEnglish (all accents): en
flux-general-multiMultilingual (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch): en, es, fr, de, hi, ru, pt, ja, it, nl

Nova-3

Nova-3 represents a significant leap forward in speech AI technology, featuring substantial improvements in accuracy and real-world application capabilities. The model delivers industry-leading performance with a 54.2% reduction in word error rate (WER) for streaming and 47.4% for batch processing compared to competitors.

Nova-3 introduces groundbreaking features including real-time multilingual conversation transcription, enhanced comprehension of domain-specific terminology, and optional personal information redaction. Notably, it’s the first voice AI model to offer self-serve customization, enabling instant vocabulary adaptation without model retraining. In multilingual testing, Nova-3 demonstrated superior performance across all seven tested languages, with particularly strong results showing up to 8:1 preference ratios in certain languages.

Model OptionLanguage
nova-3 or nova-3-generalMultilingual (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch): multi ,
Arabic: ar, ar-AE, ar-SA, ar-QA, ar-KW, ar-SY, ar-LB, ar-PS, ar-JO, ar-EG, ar-SD, ar-TD, ar-MA, ar-DZ, ar-TN, ar-IQ, ar-IR,
Belarusian: be,
Bengali: bn,
Bosnian: bs,
Bulgarian: bg,
Catalan: ca,
Chinese (Cantonese, Traditional): zh-HK,
Chinese (Mandarin, Simplified): zh, zh-CN, zh-Hans,
Chinese (Mandarin, Traditional): zh-TW, zh-Hant,
Croatian: hr,
Czech: cs,
Danish: da, da-DK,
Dutch: nl,
English: en, en-US, en-AU, en-GB, en-IN, en-NZ,
Estonian: et,
Finnish: fi,
Flemish: nl-BE,
French: fr, fr-CA,
German: de,
German (Switzerland): de-CH,
Greek: el,
Gujarati: gu, gu-IN,
Hebrew: he,
Hindi: hi,
Hungarian: hu,
Indonesian: id,
Italian: it,
Japanese: ja,
Kannada: kn,
Korean: ko, ko-KR,
Latvian: lv,
Lithuanian: lt,
Macedonian: mk,
Malay: ms,
Marathi: mr,
Norwegian: no,
Persian: fa,
Polish: pl,
Portuguese: pt, pt-BR, pt-PT,
Romanian: ro,
Russian: ru,
Serbian: sr,
Slovak: sk,
Slovenian: sl,
Spanish: es, es-419,
Swedish: sv, sv-SE,
Tagalog: tl,
Tamil: ta,
Telugu: te,
Thai: th, th-TH,
Turkish: tr,
Ukrainian: uk,
Urdu: ur,
Vietnamese: vi
nova-3-medicalEnglish: en, en-US, en-AU, en-CA, en-GB, en-IE, en-IN, en-NZ

Nova-2

Recommended for use cases with languages not yet supported by nova-3, and filler word identification.

Model OptionLanguage
nova-2 or nova-2-generalMultilingual (Spanish + English): multi ,
Bulgarian: bg,
Catalan: ca,
Chinese (Mandarin, Simplified):zh, zh-CN,zh-Hans,
Chinese (Mandarin, Traditional):zh-TW,zh-Hant,
Chinese (Cantonese, Traditional): zh-HK,
Czech: cs,
Danish: da, da-DK,
Dutch: nl,
English: en, en-US, en-AU, en-GB, en-NZ, en-IN,
Estonian: et,
Finnish: fi,
Flemish: nl-BE,
French: fr, fr-CA,
German: de,
German (Switzerland): de-CH,
Greek: el,
Hindi: hi,
Hungarian: hu,
Indonesian: id,
Italian: it,
Japanese: ja,
Korean: ko, ko-KR,
Latvian: lv,
Lithuanian: lt,
Malay: ms,
Norwegian: no,
Polish: pl,
Portuguese: pt, pt-BR, pt-PT,
Romanian: ro,
Russian: ru,
Slovak: sk,
Spanish: es, es-419,
Swedish: sv, sv-SE,
Thai: th, th-TH,
Turkish: tr,
Ukrainian: uk,
Vietnamese: vi
nova-2-meetingEnglish: en, en-US
nova-2-phonecallEnglish: en, en-US
nova-2-financeEnglish: en, en-US
nova-2-conversationalaiEnglish: en, en-US
nova-2-voicemailEnglish: en, en-US
nova-2-videoEnglish: en, en-US
nova-2-medicalEnglish: en, en-US
nova-2-drivethruEnglish: en, en-US
nova-2-automotiveEnglish: en, en-US
nova-2-atcEnglish: en, en-US
nova-2-<CUSTOM>All available

Legacy Models

Nova

Nova 1 is the predecessor to Nova-2.

Model OptionLanguage
nova or nova-generalEnglish: en, en-US, en-AU, en-GB, en-NZ, en-IN Spanish: es, es-419 Hindi:hi-Latn
nova-phonecallEnglish: en, en-US
nova-medicalEnglish: en, en-US
nova-<CUSTOM>All available

Enhanced

Recommended for lower word error rates than Base, high accuracy timestamps, and use cases that require keyword boosting.

Model OptionLanguage
enhanced or enhanced-generalDanish: da Dutch: nl English: en, en-US Flemish: nl French: fr German: de Hindi: hi Italian: it Japanese: ja Korean: ko Norwegian: no Polish: pl Portuguese: pt, pt-BR, pt-PT Spanish: es, es-419, es-LATAM Swedish: sv Tamasheq: taq Tamil: ta
enhanced-meetingEnglish: en, en-US
enhanced-phonecallEnglish: en, en-US
enhanced-financeEnglish: en, en-US
enhanced-<CUSTOM>All available

Base

Recommended for large transcription volumes and high accuracy timestamps.

ModelLanguage
base or base-generalChinese: zh, zh-CN, zh-TW Danish: da Dutch: nl English: en, en-US Flemish: nl French: fr, fr-CA German: de Hindi: hi, hi-Latn Indonesian: id Italian: it Japanese: ja Korean: ko Norwegian: no Polish: pl Portuguese: pt, pt-BR, pt-PT Russian: ru Spanish: es, es-419, es-LATAM Swedish: sv Tamasheq: taq Turkish: tr Ukrainian: uk
base-meetingEnglish: en, en-US
base-phonecallEnglish: en, en-US
base-financeEnglish: en, en-US
base-conversationalaiEnglish: en, en-US
base-voicemailEnglish: en, en-US
base-videoEnglish: en, en-US
base-<CUSTOM>All available

Deepgram Whisper Cloud

Whisper models are less scalable than all other Deepgram models due to their inherent model architecture. All non-Whisper models will return results faster and scale to higher load.

Deepgram Whisper Cloud is a fully managed API that gives you access to Deepgram’s version of OpenAI’s Whisper model. Read our guide Deepgram Whisper Cloud for a deeper dive into this offering.

  • Additional rate limits apply to Whisper due to poor scalability.
  • Requests to Whisper are limited to 15 concurrent requests with a paid plan and 5 concurrent requests with the pay-as-you-go plan.
  • Long audio files are supported up to a maximum of 20 minutes of processing time (the maximum length of the audio depends on the size of the Whisper model).

Deepgram’s Whisper Cloud models can be called with the following syntax:

Text
https://api.deepgram.com/v1/listen?model=whisper
Text
https://api.deepgram.com/v1/listen?model=whisper-SIZE
ModelLanguage
whisper-tinySee available
whisper-baseSee available
whisper-smallSee available
whisper-medium OR whisperSee available
whisper-largeSee available