Models & Languages Overview

Comparison of our speech-to-text models. For more details on any of these models, please refer to the corresponding documentation.

Deepgram model evolution over time. Benchmarked in 2023.

Deepgram model evolution over time. Benchmarked in 2023.

Nova-2

๐Ÿ“˜

Recommended for readability and Deepgram's lowest word error rates. Recommended for most use cases.

Nova-2 expands on Nova-1's advancements with speech-specific optimizations to the underlying Transformer architecture, advanced data curation techniques, and a multi-stage training methodology. These changes yield reduced word error rate (WER) and enhancements to entity recognition (i.e. proper nouns, alphanumerics, etc.), punctuation, and capitalization.

See the benchmarks.

The Nova-2 models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=nova-2

If you prefer to specify both model and tier parameters, you can use Nova-2 models with the following syntax:

https://api.deepgram.com/v1/listen?tier=nova&model=2-general
Model Language
nova-2 or nova-2-generalEnglish: en, en-US, en-AU, en-GB, en-NZ, en-IN
French: fr, fr-CA
German: de
Hindi: hi, hi-Latn
Portuguese: pt, pt-BR
Spanish: es, es-419
nova-2-meetingEnglish: en, en-US
nova-2-phonecallEnglish: en, en-US
nova-2-financeEnglish: en, en-US
nova-2-conversationalaiEnglish: en, en-US
nova-2-voicemailEnglish: en, en-US
nova-2-videoEnglish: en, en-US
nova-2-medicalEnglish: en, en-US
nova-2-drivethruEnglish: en, en-US
nova-2-automotiveEnglish: en, en-US
nova-2-<CUSTOM>All available

Nova-1

๐Ÿ“˜

Recommended for readability and low word error rates.

Nova is the predecessor to Nova-2. Training on this model spans over 100 domains and 47 billion tokens, making it the deepest-trained automatic speech recognition (ASR) model to date. Nova doesn't just excel in one specific domain โ€” it is ideal for a wide array of voice applications that require high accuracy in diverse contexts. See the benchmarks.

The Nova-1 models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=nova
Model Language
nova or nova-generalEnglish: en, en-US, en-AU, en-GB, en-NZ, en-IN
Spanish: es, es-419
nova-phonecallEnglish: en, en-US
nova-medicalEnglish: en, en-US
nova-<CUSTOM>All available

Enhanced

๐Ÿ“˜

Recommended for lower word error rates than Base, high accuracy timestamps, and use cases that require keyword boosting.

The Enhanced models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=enhanced
Model Language
enhanced or enhanced-generalDanish: da
Dutch: nl
English: en, en-US
Flemish: nl
French: fr, fr-CA
German: de
Hindi: hi, hi-Latn
Italian: it
Japanese: ja
Korean: ko
Norwegian: no
Polish: pl
Portuguese: pt, pt-BR
Spanish: es, es-419, es-LATAM
Swedish: sv
Tamil: ta
Tamasheq: taq
enhanced-meetingEnglish: en, en-US
enhanced-phonecallEnglish: en, en-US
enhanced-financeEnglish: en, en-US
enhanced-<CUSTOM>All available

Base

๐Ÿ“˜

Recommended for large transcription volumes and high accuracy timestamps.

The Base models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=general
Model Language
base or generalChinese: zh, zh-CN, zh-TW
Danish: da
Dutch: nl
English: en, en-US
Flemish: nl
French: fr, fr-CA
German: de
Hindi: hi, hi-Latn
Indonesian: id
Italian: it
Japanese: ja
Korean: ko
Norwegian: no
Polish: pl
Portuguese: pt, pt-BR
Russian: ru
Spanish: es, es-419, es-LATAM
Swedish: sv
Tamil: ta
Tamasheq: taq
Turkish: tr
Ukrainian: uk
meetingEnglish: en, en-US
phonecallEnglish: en, en-US
financeEnglish: en, en-US
conversationalaiEnglish: en, en-US
voicemailEnglish: en, en-US
videoEnglish: en, en-US
base-<CUSTOM>All available

Deepgram Whisper Cloud

๐Ÿ“˜

Whisper models are less scalable than all other Deepgram models doe to their inherent model architecture. All non-Whisper models will return results faster and scale to higher load.

Deepgram Whisper Cloud is a fully managed API that gives you access to Deepgramโ€™s version of OpenAIโ€™s Whisper model. Read our guide Deepgram Whisper Cloud for a deeper dive into this offering.

  • Additional rate limits apply to Whisper due to poor scalability.
  • Requests to Whisper are limited to 15 concurrent requests with a paid plan and 5 concurrent requests with the pay-as-you-go plan.
  • Long audio files are supported up to a maximum of 20 minutes of processing time (the maximum length of the audio depends on the size of the Whisper model).

Deepgram's Whisper Cloud models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=whisper
https://api.deepgram.com/v1/listen?model=whisper-SIZE
ModelLanguage
whisper-tinySee available
whisper-baseSee available
whisper-smallSee available
whisper-medium OR whisperSee available
whisper-largeSee available