For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
      • Automatically Generating WebVTT & SRT Captions
      • Automatically Transcribe and Summarize Phone Calls
      • Getting Started with Deepgram Whisper Cloud
      • Generating and Saving Transcripts From the Terminal
      • Using Callbacks to Return Transcripts to Your Server
      • When Callback Is Not Received
      • When To Use Multichannel and Diarization
      • When To Use Keywords and Search
  • Streaming Audio
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Getting Started
  • Create a Deepgram Account
  • Create a Deepgram API Key
  • Transcribe a Remote File
  • Analyze Response
  • Enable Whisper Model and Sizes
  • Other Features
  • Language Detection
  • Supported languages
  • Deepgram Features
  • Caveats
Pre-Recorded AudioTips and Tricks

Getting Started with Deepgram Whisper Cloud

Deepgram Whisper Cloud is a fully managed API that gives you access to Deepgram’s version of OpenAI’s Whisper model.

Was this page helpful?
Previous

Generating and Saving Transcripts From the Terminal

It can be helpful to know how to see and save Deepgram transcripts directly in your terminal. Learn how to use bash commands and scripts to execute directly in your terminal.
Next
Built with
Pre-recorded Streaming:Nova

Using Deepgram’s fully hosted Whisper Cloud instead of running your own version provides many benefits. Some of these benefits include:

  • Pairing the Whisper model with Deepgram features that you can’t get using the OpenAI speech-to-text API, such as diarization and word timings.
  • Support for all Whisper model sizes: tiny, base, small, medium, and large.
  • Support for up to 5 concurrent requests for the Pay As You Go and Growth plans.

Deepgram hosts and maintains these Whisper models; they aren’t hosted or run by Open AI. Therefore, data sent through API requests for our Whisper models will not be sent to OpenAI.

Live streaming is not available with Deepgram Whisper Cloud. If you would like to transcribe live streamed audio, we recommend using our Nova-3 model. This guide can help you get started.

Getting Started

In this guide, you’ll learn how to transcribe pre-recorded audio using Deepgram’s hosted Whisper API.

Create a Deepgram Account

Before you can use Deepgram, you’ll need to create a Deepgram account. Signup is free and includes $200 in free credit and access to all of Deepgram’s features!

Create a Deepgram API Key

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

Transcribe a Remote File

Transcribe a remote file using Deepgram’s Whisper API with the following request.

cURL
$curl \
> --request POST \
> --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
> --header 'Content-Type: application/json' \
> --data '{"url":"https://static.deepgram.com/examples/Bueller-Life-moves-pretty-fast.wav}' \
> --url 'https://api.deepgram.com/v1/listen?model=whisper'

If you would like to use a Deepgram SDK to make the request, follow the steps in the Pre-Recorded speech-to-text guide, but change the model to whisper.

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

JSON
1{
2 "metadata": {
3 "transaction_key": "deprecated",
4 "request_id": "6ba2879c...",
5 "sha256": "6a7d98...",
6 "created": "2023-04-12T20:33:53.620Z",
7 "duration": 96.56319,
8 "channels": 1,
9 "models": [
10 "e04910..."
11 ],
12 "model_info": {
13 "e04910...": {
14 "name": "medium-en-whisper",
15 "version": "2022-09-21.4",
16 "arch": "whisper"
17 }
18 }
19 },
20 "results": {
21 "channels": [
22 {
23 "alternatives": [
24 {
25 "transcript": "another big problem in the speech analytics space when customers first bring the software on is that they are blown away by the fact that an engine can monitor hundreds of kpis ...",
26 "confidence": 0.98273027,
27 "words": [
28 {
29 "word": "another",
30 "start": 0.06,
31 "end": 0.56,
32 "confidence": 0.34510013
33 },
34 {
35 "word": "big",
36 "start": 0.84,
37 "end": 1.3399999,
38 "confidence": 0.9840386
39 },
40 {
41 "word": "problem",
42 "start": 1.54,
43 "end": 2.04,
44 "confidence": 0.9970716
45 },
46 ...
47 ]
48 }
49 ]
50 }
51 ]
52 }
53}

Enable Whisper Model and Sizes

To enable Deepgram’s Whisper API, add a model parameter in the query string and set it to model=whisper

Bash
$https://api.deepgram.com/v1/listen?model=whisper

To enable a specific size of the Whisper model, set the model parameter to model=whisper-size.

Bash
$https://api.deepgram.com/v1/listen?model=whisper-SIZE

If model=whisper is supplied and no model size specified, the model size will default to model=whisper-medium.

These are the Deepgram Whisper Cloud models available:

  • model=whisper (defaults to whisper-medium)
  • model=whisper-tiny
  • model=whisper-base
  • model=whisper-small
  • model=whisper-medium
  • model=whisper-large (defaults to large-v2)

Other Features

Language Detection

Deepgram Whisper Cloud supports language detection, which means just by setting detect_language=true, your audio will be transcribed in the detected language.

Officially supported languages include Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh. (Source: “Whisper API FAQ”)

Supported languages

Languages supported by whisper include: en, zh, de, es, ru, ko, fr, ja, pt, tr, pl, ca, nl, ar, sv, it, id, hi, fi, vi, he, uk, el, ms, cs, ro, da, hu, ta, no, th, ur, hr, bg, lt, la, mi, ml, cy, sk, te, fa, lv, bn, sr, az, sl, kn, et, mk, br, eu, is, hy, ne, mn, bs, kk, sq, sw, gl, mr, pa, si, km, sn, yo, so, af, oc, ka, be, tg, sd, gu, am, yi, lo, uz, fo, ht, ps, tk, nn, mt, sa, lb, my, bo, tl, mg, as, tt, haw, ln, ha, ba, jw, su.

If you would like to transcribe audio in a specific language, you can do so by setting the language parameter in the query string. You can pass in any language code supported by Whisper through our language parameter. To learn more about languages, see Language.

https://api.deepgram.com/v1/listen?model=whisper&language=en

Deepgram Features

This is a list of Deepgram Features and their current status for use with Deepgram Whisper Cloud:

FeatureStatus
Alternatives✅
Callbacks✅
Speaker Diarization✅
Entity Detection❌
Find and Replace✅
Keywords❌
Language Detection✅
Multichannel✅
Numerals✅
Paragraphs✅
Profanity Filter❌
Redaction✅
Search❌
Smart Format✅
Summarization✅
Topic Detection✅
Utterances✅

Caveats

  • It’s important to understand that Whisper models are less scalable than all other Deepgram models due to their inherent model architecture. Deepgram’s non-Whisper models will return results faster and scale to a higher load, so we recommend using a Deepgram model such as Nova if it can meet your needs.
  • There is a 10 minute time out for all Deepgram models. Transcription requests that run longer than 10 minutes will return a 504 error.