Model

Learn about Deepgram's Model feature, which allows you to supply a model to use to process submitted audio.

Deepgram’s Model feature allows you to supply a model to use when processing submitted audio. Each model belongs to a model tier (identified as tier in the Deepgram API).

Some model options are trained for specific use cases.

Model Tiers & Options

Below is a list of model tiers (tier), each of which has its own list of model options (model).

ℹ️

All model tiers have been trained with our patented AutoML ™ training. To learn more about tiers, see Tier.

For self-serve customers, Deepgram provides Nova, Enhanced, Base, and Whisper model tiers along with each model's options. To learn more about pricing, see Deepgram Pricing & Plans.

Nova

Nova model tiers are our newest and most powerful speech-to-text models on the market today. Deepgram's Nova models have the following options:

  • general: Optimized for everyday audio processing. Likely to be more accurate than any region-specific Base model for the language for which it is enabled. If you aren't sure which model to select, start here.

  • phonecall: Optimized for low-bandwidth audio phone calls.

The Nova models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=nova
https://api.deepgram.com/v1/listen?tier=nova&model=OPTION

Read the Nova Quickstart to learn more about getting started with Nova.

Enhanced

Enhanced model tiers are still some of our most powerful ASR models; they generally have higher accuracy and better word recognition than our base models, and they handle uncommon words significantly better. Deepgram's Enhanced models have the following options:

  • general: Optimized for everyday audio processing. Likely to be more accurate than any region-specific Base model for the language for which it is enabled. If you aren't sure which model to select, start here.

  • meeting beta: Optimized for conference room settings, which include multiple speakers with a single microphone.

  • phonecall: Optimized for low-bandwidth audio phone calls.

  • finance beta: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.

The Enhanced models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=enhanced
https://api.deepgram.com/v1/listen?tier=enhanced&model=OPTION

Base

Base model tiers are built on our signature end-to-end deep learning speech model architecture. They offer a solid combination of accuracy and cost effectiveness in some cases. Deepgram's Base models have the following options:

  • general: (Default) Optimized for everyday audio processing.
  • meeting: Optimized for conference room settings, which include multiple speakers with a single microphone.
  • phonecall: Optimized for low-bandwidth audio phone calls.
  • voicemail: Optimized for low-bandwidth audio clips with a single speaker. Derived from the phonecall model.
  • finance: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.
  • conversationalai: Optimized to allow artificial intelligence technologies, such as chatbots, to interact with people in a human-like way.
  • video: Optimized for audio sourced from videos.

The Base models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=base
https://api.deepgram.com/v1/listen?tier=base&model=OPTION

Custom

You may also use a custom, trained model associated with your account by including its custom_id.

Whisper

Deepgram's Whisper Cloud is a fully managed API that gives you access to Deepgram's version of OpenAI’s Whisper model. Read our guide Deepgram Whisper Cloud for a deeper dive into this offering.

Deepgram's Whisper models have the following size options:

  • tiny: Contains 39 M parameters. The smallest model available.
  • base: Contains 74 M parameters.
  • small: Contains 244 M parameters.
  • medium: Contains 769 M parameters. The default model if you don't specify a size.
  • large: Contains 1550 M parameters. The largest model available. Defaults to OpenAI’s Whisper large-v2.

Deepgram's Whisper Cloud models can be called with the following syntax:

https://api.deepgram.com/v1/listen?model=whisper
https://api.deepgram.com/v1/listen?model=whisper-SIZE

⚠️

Deepgram's Whisper Cloud does not expect a tier parameter. Using tier will not work.

Try it out

To transcribe audio from a file on your computer using a particular model, run the following curl command in a terminal or your favorite API client.

ℹ️

Be sure to replace the placeholder OPTION with your chosen model and YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?model=OPTION'