Model
Learn about Deepgram's Model feature, which allows you to supply a model to use to process submitted audio.
Deepgram’s Model feature allows you to supply a model to use when processing submitted audio. Each model belongs to a model tier (identified as tier in the Deepgram API).
Some model options are trained for specific use cases.
Model Tiers & Options
Below is a list of model tiers (tier
), each of which has its own list of model options (model
).
All model tiers have been trained with our patented AutoML ™ training. To learn more about tiers, see Tier.
For self-serve customers, Deepgram provides Nova, Enhanced, Base, and Whisper model tiers along with each model's options. To learn more about pricing, see Deepgram Pricing & Plans.
Nova
Nova model tiers are our newest and most powerful speech-to-text models on the market today. Deepgram's Nova models have the following options:
-
general
: Optimized for everyday audio processing. Likely to be more accurate than any region-specific Base model for the language for which it is enabled. If you aren't sure which model to select, start here. -
phonecall
: Optimized for low-bandwidth audio phone calls.
The Nova models can be called with the following syntax:
https://api.deepgram.com/v1/listen?model=nova
https://api.deepgram.com/v1/listen?tier=nova&model=OPTION
Read the Nova Quickstart to learn more about getting started with Nova.
Enhanced
Enhanced model tiers are still some of our most powerful ASR models; they generally have higher accuracy and better word recognition than our base models, and they handle uncommon words significantly better. Deepgram's Enhanced models have the following options:
-
general
: Optimized for everyday audio processing. Likely to be more accurate than any region-specific Base model for the language for which it is enabled. If you aren't sure which model to select, start here. -
meeting
beta: Optimized for conference room settings, which include multiple speakers with a single microphone. -
phonecall
: Optimized for low-bandwidth audio phone calls. -
finance
beta: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.
The Enhanced models can be called with the following syntax:
https://api.deepgram.com/v1/listen?model=enhanced
https://api.deepgram.com/v1/listen?tier=enhanced&model=OPTION
Base
Base model tiers are built on our signature end-to-end deep learning speech model architecture. They offer a solid combination of accuracy and cost effectiveness in some cases. Deepgram's Base models have the following options:
general
: (Default) Optimized for everyday audio processing.meeting
: Optimized for conference room settings, which include multiple speakers with a single microphone.phonecall
: Optimized for low-bandwidth audio phone calls.voicemail
: Optimized for low-bandwidth audio clips with a single speaker. Derived from the phonecall model.finance
: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.conversationalai
: Optimized to allow artificial intelligence technologies, such as chatbots, to interact with people in a human-like way.video
: Optimized for audio sourced from videos.
The Base models can be called with the following syntax:
https://api.deepgram.com/v1/listen?model=base
https://api.deepgram.com/v1/listen?tier=base&model=OPTION
Custom
You may also use a custom, trained model associated with your account by including its custom_id
.
Whisper
Deepgram's Whisper Cloud is a fully managed API that gives you access to Deepgram's version of OpenAI’s Whisper model. Read our guide Deepgram Whisper Cloud for a deeper dive into this offering.
Deepgram's Whisper models have the following size options:
tiny
: Contains 39 M parameters. The smallest model available.base
: Contains 74 M parameters.small
: Contains 244 M parameters.medium
: Contains 769 M parameters. The default model if you don't specify a size.large
: Contains 1550 M parameters. The largest model available. Defaults to OpenAI’s Whisper large-v2.
Deepgram's Whisper Cloud models can be called with the following syntax:
https://api.deepgram.com/v1/listen?model=whisper
https://api.deepgram.com/v1/listen?model=whisper-SIZE
Deepgram's Whisper Cloud does not expect a
tier
parameter. Usingtier
will not work.
Try it out
To transcribe audio from a file on your computer using a particular model, run the following curl command in a terminal or your favorite API client.
Be sure to replace the placeholder
OPTION
with your chosen model andYOUR_DEEPGRAM_API_KEY
with your Deepgram API Key. You can create an API Key in the Deepgram Console
curl \
--request POST \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'Content-Type: audio/wav' \
--data-binary @youraudio.wav \
--url 'https://api.deepgram.com/v1/listen?model=OPTION'
Updated 16 days ago