Model
Deepgram’s Model feature allows you to supply a model to use when processing submitted audio.
Each model belongs to a tier. For self-serve customers, Deepgram provides Enhanced and Base model tiers. Enhanced models are our newest, most powerful ASR models. Enhanced models generally have higher accuracy with better word recognition than our Base models and they handle uncommon words significantly better. Base models are built on our signature end-to-end deep learning speech model architecture and offer a solid combination of accuracy and cost effectiveness. Both model tiers have been trained with our patented AutoML (TM) training. To learn more about tiers, see Tier.
Once you have chosen your tier and model, you can select an available language and a version. To learn more about languages, see Language. To learn more about versions, see Version.
By default, Deepgram applies its Base tier general AI model, which is a good, general-purpose model for everyday situations.
Use Cases
Some examples of use cases for Model include:
- Customers with audio data with traits that match a specific Deepgram-provided use-case model.
- Customers with specialized audio data who want to apply a custom trained model that has been optimized to provide the best results for their particular data.
Enable Feature
To enable Model, when you call Deepgram’s API, add a model
parameter in the query string and set it to the model you would like to use:
model=OPTION
By default, Deepgram applies its Base tier. If you would like to use a different tier, for a hosted deployment, add a tier
parameter in the query string as well:
tier=enhanced&model=OPTION
For an on-premises deployment, use the model
parameter in the query string and append -enhanced
after the name of the model you would like to use:
model=OPTION-enhanced
To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.
Be sure to replace the placeholder OPTION
with your chosen model and YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key. You can create an API Key in the Deepgram Console
Model Options
For self-serve customers, Deepgram provides Enhanced and Base model tiers. To learn more about pricing, see Deepgram Pricing & Plans.
Enhanced
Enhanced models are our newest, most powerful ASR models; they generally have higher accuracy and better word recognition than our Base models, and they handle uncommon words significantly better.
general
: Optimized for everyday audio processing. Generally, more accurate than any region-specific Base model for the language for which it is enabled. If you aren’t sure which model to select, start here.meeting
beta: Optimized for conference room settings, which include multiple speakers with a single microphone.phonecall
beta: Optimized for low-bandwidth audio phone calls.finance
beta: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.
Base
Base models are built on our signature end-to-end deep learning speech model architecture. They offer a solid combination of accuracy and cost effectiveness.
general
: (Default) Optimized for everyday audio processing.meeting
: Optimized for conference room settings, which include multiple speakers with a single microphone.phonecall
: Optimized for low-bandwidth audio phone calls.voicemail
: Optimized for low-bandwidth audio clips with a single speaker. Derived from the phonecall model.finance
: Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.conversationalai
: Optimized to allow artificial intelligence technologies, such as chatbots, to interact with people in a human-like way.video
: Optimized for audio sourced from videos.
Not all models are supported for all languages. For a list of languages and their supported models, see Language.
Custom
You may also use a custom, trained model associated with your account by including its custom_id
.
Hosted Open Source
Open-sourced models and inference code serve as a foundation for building useful applications and for further research on robust speech processing. Deepgram hosts these models for testing and exploration.
whisper
: OpenAI’s Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Try Whisper with Deepgram’s API, or learn more about Whisper in OpenAI’s blog post, Introducing Whisper.