Models & Languages Overview

An overview of Deepgram's speech-to-text models and supported languages.

Nova-3 (Early Access)

📘

Our most accurate model. Recommended for best performance on most use cases.

Nova-3 represents a significant leap forward in speech AI technology, featuring substantial improvements in accuracy and real-world application capabilities. The model delivers industry-leading performance with a 53.4% reduction in word error rate (WER) for streaming and 47.4% for batch processing compared to competitors. Nova-3 introduces groundbreaking features including real-time multilingual conversation transcription, enhanced comprehension of domain-specific terminology, and optional personal information redaction. Notably, it's the first voice AI model to offer self-serve customization, enabling instant vocabulary adaptation without model retraining. In multilingual testing, Nova-3 demonstrated superior performance across all seven tested languages, with particularly strong results showing up to 8:1 preference ratios in certain languages.

Examples

The Nova-3 models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=nova-3
Model Language
nova-3 or nova-3-generalEnglish: en, en-US
Multilingual coming soon

Nova-2

📘

Recommended for use with non-English transcription.

Nova-2 expands on Nova-1's advancements with speech-specific optimizations to the underlying Transformer architecture, advanced data curation techniques, and a multi-stage training methodology. These changes yield reduced word error rate (WER) and enhancements to entity recognition (i.e. proper nouns, alphanumerics, etc.), punctuation, and capitalization. See the benchmarks 📉.

Examples

The Nova-2 models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=nova-2

You can also call Nova-2 use case models with the following syntax:

https://api.deepgram.com/v1/listen?model=nova-2-phonecall
Model Language
nova-2 or nova-2-generalMultilingual (Spanish + English): multi
Bulgarian: bg
Catalan: ca
Chinese (Mandarin, Simplified):zh,zh-CN,zh-Hans
Chinese (Mandarin, Traditional):zh-TW,zh-Hant
Chinese (Cantonese, Traditional): zh-HK
Czech: cs
Danish: da, da-DK
Dutch: nl
English: en, en-US, en-AU, en-GB, en-NZ, en-IN
Estonian: et
Finnish: fi
Flemish: nl-BE
French: fr, fr-CA
German: de
German (Switzerland): de-CH
Greek: el
Hindi: hi
Hungarian: hu
Indonesian: id
Italian: it
Japanese: ja
Korean: ko, ko-KR
Latvian: lv
Lithuanian: lt
Malay: ms
Norwegian: no
Polish: pl
Portuguese: pt, pt-BR, pt-PT
Romanian: ro
Russian: ru
Slovak: sk
Spanish: es, es-419
Swedish: sv, sv-SE
Thai: th, th-TH
Turkish: tr
Ukrainian: uk
Vietnamese: vi
nova-2-meetingEnglish: en, en-US
nova-2-phonecallEnglish: en, en-US
nova-2-financeEnglish: en, en-US
nova-2-conversationalaiEnglish: en, en-US
nova-2-voicemailEnglish: en, en-US
nova-2-videoEnglish: en, en-US
nova-2-medicalEnglish: en, en-US
nova-2-drivethruEnglish: en, en-US
nova-2-automotiveEnglish: en, en-US
nova-2-atcEnglish: en, en-US
nova-2-<CUSTOM>All available

Nova

📘

Legacy Model.

Nova is the predecessor to Nova-2. Training on this model spans over 100 domains and 47 billion tokens, making it the deepest-trained automatic speech-to-text model to date. Nova doesn't just excel in one specific domain — it is ideal for a wide array of voice applications that require high accuracy in diverse contexts. See the benchmarks 📉.

Examples

The Nova models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=nova

You can also call Nova use case models with the following syntax:

https://api.deepgram.com/v1/listen?model=nova-phonecall
Model Language
nova or nova-generalEnglish: en, en-US, en-AU, en-GB, en-NZ, en-IN
Spanish: es, es-419
Hindi:hi-Latn
nova-phonecallEnglish: en, en-US
nova-medicalEnglish: en, en-US
nova-<CUSTOM>All available

Enhanced

📘

Legacy model. Recommended for lower word error rates than Base, high accuracy timestamps, and use cases that require keyword boosting.

Examples

The Enhanced models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=enhanced

You can also call Enhanced use case models with the following syntax:

https://api.deepgram.com/v1/listen?model=enhanced-phonecall
Model Language
enhanced or enhanced-generalDanish: da
Dutch: nl
English: en, en-US
Flemish: nl
French: fr
German: de
Hindi: hi
Italian: it
Japanese: ja
Korean: ko
Norwegian: no
Polish: pl
Portuguese: pt, pt-BR, pt-PT
Spanish: es, es-419, es-LATAM
Swedish: sv
Tamasheq: taq
Tamil: ta
enhanced-meetingEnglish: en, en-US
enhanced-phonecallEnglish: en, en-US
enhanced-financeEnglish: en, en-US
enhanced-<CUSTOM>All available

Base

📘

Recommended for large transcription volumes and high accuracy timestamps.

Examples

The Base models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=base

You can also call Base use case models with the following syntax:

https://api.deepgram.com/v1/listen?model=base-phonecall
Model Language
base or base-generalChinese: zh, zh-CN, zh-TW
Danish: da
Dutch: nl
English: en, en-US
Flemish: nl
French: fr, fr-CA
German: de
Hindi: hi, hi-Latn
Indonesian: id
Italian: it
Japanese: ja
Korean: ko
Norwegian: no
Polish: pl
Portuguese: pt, pt-BR, pt-PT
Russian: ru
Spanish: es, es-419, es-LATAM
Swedish: sv
Tamasheq: taq
Turkish: tr
Ukrainian: uk
base-meetingEnglish: en, en-US
base-phonecallEnglish: en, en-US
base-financeEnglish: en, en-US
base-conversationalaiEnglish: en, en-US
base-voicemailEnglish: en, en-US
base-videoEnglish: en, en-US
base-<CUSTOM>All available

Deepgram Whisper Cloud

📘

Whisper models are less scalable than all other Deepgram models due to their inherent model architecture. All non-Whisper models will return results faster and scale to higher load.

Deepgram Whisper Cloud is a fully managed API that gives you access to Deepgram’s version of OpenAI’s Whisper model. Read our guide Deepgram Whisper Cloud for a deeper dive into this offering.

  • Additional rate limits apply to Whisper due to poor scalability.
  • Requests to Whisper are limited to 15 concurrent requests with a paid plan and 5 concurrent requests with the pay-as-you-go plan.
  • Long audio files are supported up to a maximum of 20 minutes of processing time (the maximum length of the audio depends on the size of the Whisper model).

Deepgram's Whisper Cloud models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=whisper
https://api.deepgram.com/v1/listen?model=whisper-SIZE
ModelLanguage
whisper-tinySee available
whisper-baseSee available
whisper-smallSee available
whisper-medium OR whisperSee available
whisper-largeSee available

What’s Next