Migrating from OpenAI Whisper to Deepgram

Learn how to migrate from OpenAI Whisper to Deepgram. For developers or practitioners who are using OpenAI Whisper for transcription and are considering or actively moving to Deepgram.

Changing audio transcription services can be a challenging task, even for experienced teams. This guide will give you an overview of the process of migrating your transcription services from OpenAI to Deepgram to help you make the transition as quickly and efficiently as possible.

📘

If you are currently running your own version of OpenAI'a Whisper Model you can learn more about using Deepgram's Whisper Cloud Service by reading ourWhisper Cloud Quickstart as a drop in replacement to running your own Whisper model.

Create a Deepgram Account

Before you can use Deepgram, you'll need to create a Deepgram account. Signup is free and includes:

📘

To access Deepgram’s API, you'll need an API Key. See our Guide on Creating API Keys to learn more.

Migration Process

During the migration process, you will need to perform the following tasks:

Before MigrationDuring MigrationAfter Migration
- Identify any upstream dependencies on your transcriptions
- Find representative samples of your audio for testing
- Get familiar with Deepgram’s API and understand differences from OpenAI
- Create an API key
- Test your audio
- Create a migration plan
- Create a rollback plan
- Configure your response parsing to conform to Deepgram's JSON response
- Swap over traffic to Deepgram API
- Monitor systems
- Testing
- Tune downstream processes to Deepgram output

Differences

Once you’ve selected your model tier, Deepgram provides many features and capabilities to help you transcribe and classify your audio. However, some capabilities and concepts are implemented differently from OpenAI.

Features & CapabilitiesDeepgramOpenAI
Audio Files TypesDeepgram supports over 100 different audio formats and encodings some of the most common are: mp3, mp4, mp2, AAC, wav, FLAC, PCM, m4a, Ogg, Opus, WebM, and more!Limited audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
Word TimingsYesV3 - Stand alone model only
Confidence ScoringYesNo
StreamingYesNo
Models SupportMultiple Domain ModelsOne
Language SupportManyMany
TranslationNoYes
DiarizationYesNo
PromptingNoYes
Transcription Format Optionsjson, srt, vtt textjson, text, srt, verbose_json, vtt
TemperatureNoYes

Detailed Description of Differences

Open AI

  • OpenAI provide you with a just a single text field in the response.
  • Open AI allows you to use a prompt in your request body to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt.
  • Open AI allows you to send a temperature value between 0 to 1 in your request body. A Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  • if you run your own version of the Whisper v3 model you can expect to see timestamps/word timings but this feature currently isn't available in the OpenAI Transcribe API.

Deepgram

  • Deepgram provides you with a significant number of additional fields in the response that can help you better use your transcription output this includes:
    • useful meta data about your request
    • an overall transcription confidence score
    • individual word timings
    • individual word confidence scores
  • Deepgram doesn't require a temperature score as our models are highly trained and highly accurate and will return the best possible result without the temperature being defined by the user.
  • SRT and VTT formats can be obtained by using our Python or Javascript Captions Package. These can be used as stand alone packages and don't require the Deepgram SDK.
  • Deepgram can be used to obtain only text as a transcription format. If you index into the transcript JSON field, then you can obtain just the text of the transcription.

Open API Default JSON Response

{
  "text": "Yeah, as much as it's worth celebrating the first spacewalk with an all-female team, I think many of us are looking forward to it just being normal. And I think if it signifies anything, it is to honor the women who came before us who were skilled and qualified and didn't get the same opportunities that we have today."
}

Deepgram Default JSON Response

Interim Response
{
  "channels": {
    "alternatives": [
      {
        "transcript": "yeah as as much as",
        "confidence": 0.9970703,
        "words": [
          {
            "word": "yeah",
            "start": 0.0,
            "end": 0.32,
            "confidence": 0.85375977
          },
          {
            "word": "as",
            "start": 0.32,
            "end": 0.82,
            "confidence": 0.99072266
          },
          {
            "word": "as",
            "start": 0.88,
            "end": 1.04,
            "confidence": 0.9394531
          },
          {
            "word": "much",
            "start": 1.04,
            "end": 1.28,
            "confidence": 0.99316406
          },
          {
            "word": "as",
            "start": 1.28,
            "end": 1.68,
            "confidence": 0.81347656
          }
        ]
      }
    ]
  },
  "channel_index": [0, 1],
  "duration": 1.9999375,
  "is_final": false,
  "metadata": {
    "model_uuid": "aa274f3c-e8b3-456a-ac08-dfd797d45514",
    "request_id": "3b851a69-291c-4994-897f-325644e98558"
  },
  "speech_final": false,
  "start": 0.0
}

Final Result
{
    "metadata": {
        "transaction_key": "deprecated",
        "request_id": "3b851a69-291c-4994-897f-325644e98558",
        "sha256": "154e291ecfa8be6ab8343560bcc109008fa7853eb5372533e8efdefc9b504c33",
        "created": "2023-11-09T01:01:05.068Z",
        "duration": 25.933313,
        "channels": 1,
        "models": [
            "aa274f3c-e8b3-456a-ac08-dfd797d45514"
        ],
        "model_info": {
            "aa274f3c-e8b3-456a-ac08-dfd797d45514": {
                "name": "general-nova",
                "version": "2023-07-06.22746",
                "arch": "nova"
            }
        }
    },
    "results": {
        "channels": [
            {
                "alternatives": [
                    {
                        "transcript": "yeah as as much as it's worth celebrating the first space walk with an all female team i think many of us are looking forward to it just being normal and i think if it signifies anything it is to honor the the woman who came before us who were skilled and qualified and didn't get the the same opportunities that we have today",
                        "confidence": 0.9970703,
                        "words": [
                            {
                                "word": "yeah",
                                "start": 0.0,
                                "end": 0.32,
                                "confidence": 0.85375977
                            },
                            {
                                "word": "as",
                                "start": 0.32,
                                "end": 0.82,
                                "confidence": 0.99072266
                            },
                            {
                                "word": "as",
                                "start": 0.88,
                                "end": 1.04,
                                "confidence": 0.9394531
                            },
                            {
                                "word": "much",
                                "start": 1.04,
                                "end": 1.28,
                                "confidence": 0.99316406
                            },
                            {
                                "word": "as",
                                "start": 1.28,
                                "end": 1.68,
                                "confidence": 0.81347656
                            },
                            {
                                "word": "it's",
                                "start": 2.0,
                                "end": 2.32,
                                "confidence": 0.9992676
                            },
                            {
                                "word": "worth",
                                "start": 2.32,
                                "end": 2.72,
                                "confidence": 1.0
                            },
                            {
                                "word": "celebrating",
                                "start": 2.72,
                                "end": 3.22,
                                "confidence": 0.810791
                            },
                            {
                                "word": "the",
                                "start": 4.48,
                                "end": 4.64,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "first",
                                "start": 4.64,
                                "end": 5.14,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "space",
                                "start": 5.2,
                                "end": 5.6,
                                "confidence": 0.22058105
                            },
                            {
                                "word": "walk",
                                "start": 5.6,
                                "end": 6.1,
                                "confidence": 0.8178711
                            },
                            {
                                "word": "with",
                                "start": 6.3199997,
                                "end": 6.56,
                                "confidence": 0.99658203
                            },
                            {
                                "word": "an",
                                "start": 6.56,
                                "end": 6.7999997,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "all",
                                "start": 6.7999997,
                                "end": 6.96,
                                "confidence": 0.99853516
                            },
                            {
                                "word": "female",
                                "start": 6.96,
                                "end": 7.3599997,
                                "confidence": 1.0
                            },
                            {
                                "word": "team",
                                "start": 7.3599997,
                                "end": 7.8599997,
                                "confidence": 0.74890137
                            },
                            {
                                "word": "i",
                                "start": 8.395,
                                "end": 8.555,
                                "confidence": 0.9941406
                            },
                            {
                                "word": "think",
                                "start": 8.555,
                                "end": 8.875,
                                "confidence": 0.99853516
                            },
                            {
                                "word": "many",
                                "start": 8.875,
                                "end": 9.115001,
                                "confidence": 0.99316406
                            },
                            {
                                "word": "of",
                                "start": 9.115001,
                                "end": 9.275001,
                                "confidence": 1.0
                            },
                            {
                                "word": "us",
                                "start": 9.275001,
                                "end": 9.775001,
                                "confidence": 1.0
                            },
                            {
                                "word": "are",
                                "start": 9.835,
                                "end": 10.155001,
                                "confidence": 0.99365234
                            },
                            {
                                "word": "looking",
                                "start": 10.155001,
                                "end": 10.475,
                                "confidence": 0.99853516
                            },
                            {
                                "word": "forward",
                                "start": 10.475,
                                "end": 10.795,
                                "confidence": 1.0
                            },
                            {
                                "word": "to",
                                "start": 10.795,
                                "end": 10.955,
                                "confidence": 0.9995117
                            },
                            {
                                "word": "it",
                                "start": 10.955,
                                "end": 11.115001,
                                "confidence": 0.9980469
                            },
                            {
                                "word": "just",
                                "start": 11.115001,
                                "end": 11.435,
                                "confidence": 0.9165039
                            },
                            {
                                "word": "being",
                                "start": 11.435,
                                "end": 11.915001,
                                "confidence": 0.9995117
                            },
                            {
                                "word": "normal",
                                "start": 11.915001,
                                "end": 12.415001,
                                "confidence": 0.92944336
                            },
                            {
                                "word": "and",
                                "start": 12.715,
                                "end": 13.115,
                                "confidence": 0.92944336
                            },
                            {
                                "word": "i",
                                "start": 13.835001,
                                "end": 13.915001,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "think",
                                "start": 13.915001,
                                "end": 14.155001,
                                "confidence": 1.0
                            },
                            {
                                "word": "if",
                                "start": 14.155001,
                                "end": 14.395,
                                "confidence": 0.97998047
                            },
                            {
                                "word": "it",
                                "start": 14.395,
                                "end": 14.475,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "signifies",
                                "start": 14.475,
                                "end": 14.975,
                                "confidence": 0.9975586
                            },
                            {
                                "word": "anything",
                                "start": 15.035,
                                "end": 15.535,
                                "confidence": 0.88378906
                            },
                            {
                                "word": "it",
                                "start": 15.74,
                                "end": 15.98,
                                "confidence": 0.57714844
                            },
                            {
                                "word": "is",
                                "start": 15.98,
                                "end": 16.3,
                                "confidence": 0.751709
                            },
                            {
                                "word": "to",
                                "start": 16.86,
                                "end": 17.02,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "honor",
                                "start": 17.02,
                                "end": 17.34,
                                "confidence": 1.0
                            },
                            {
                                "word": "the",
                                "start": 17.34,
                                "end": 17.58,
                                "confidence": 0.9970703
                            },
                            {
                                "word": "the",
                                "start": 17.58,
                                "end": 17.74,
                                "confidence": 0.97021484
                            },
                            {
                                "word": "woman",
                                "start": 17.74,
                                "end": 18.06,
                                "confidence": 0.8251953
                            },
                            {
                                "word": "who",
                                "start": 18.06,
                                "end": 18.22,
                                "confidence": 0.9995117
                            },
                            {
                                "word": "came",
                                "start": 18.22,
                                "end": 18.46,
                                "confidence": 1.0
                            },
                            {
                                "word": "before",
                                "start": 18.46,
                                "end": 18.779999,
                                "confidence": 1.0
                            },
                            {
                                "word": "us",
                                "start": 18.779999,
                                "end": 19.279999,
                                "confidence": 0.9995117
                            },
                            {
                                "word": "who",
                                "start": 19.42,
                                "end": 19.92,
                                "confidence": 0.6894531
                            },
                            {
                                "word": "were",
                                "start": 20.14,
                                "end": 20.38,
                                "confidence": 0.42407227
                            },
                            {
                                "word": "skilled",
                                "start": 20.38,
                                "end": 20.88,
                                "confidence": 0.953125
                            },
                            {
                                "word": "and",
                                "start": 20.939999,
                                "end": 21.18,
                                "confidence": 0.9980469
                            },
                            {
                                "word": "qualified",
                                "start": 21.18,
                                "end": 21.68,
                                "confidence": 0.8417969
                            },
                            {
                                "word": "and",
                                "start": 22.38,
                                "end": 22.539999,
                                "confidence": 0.9995117
                            },
                            {
                                "word": "didn't",
                                "start": 22.539999,
                                "end": 22.86,
                                "confidence": 0.9875488
                            },
                            {
                                "word": "get",
                                "start": 22.86,
                                "end": 23.18,
                                "confidence": 0.9975586
                            },
                            {
                                "word": "the",
                                "start": 23.18,
                                "end": 23.5,
                                "confidence": 0.9399414
                            },
                            {
                                "word": "the",
                                "start": 23.5,
                                "end": 23.66,
                                "confidence": 0.70751953
                            },
                            {
                                "word": "same",
                                "start": 23.66,
                                "end": 23.9,
                                "confidence": 0.99853516
                            },
                            {
                                "word": "opportunities",
                                "start": 23.9,
                                "end": 24.4,
                                "confidence": 0.99853516
                            },
                            {
                                "word": "that",
                                "start": 24.46,
                                "end": 24.619999,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "we",
                                "start": 24.619999,
                                "end": 24.779999,
                                "confidence": 1.0
                            },
                            {
                                "word": "have",
                                "start": 24.779999,
                                "end": 25.02,
                                "confidence": 0.99902344
                            },
                            {
                                "word": "today",
                                "start": 25.02,
                                "end": 25.52,
                                "confidence": 0.8835449
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
What to Expect in the JSON Response

The Deepgram response will contain the following fields:

  • transcript(string)
  • start_time (duration)
  • end_time (duration)
  • word (string)
  • confidence (float)

The OpenAI response will contain the following fields:

  • text(string)

Use Case or Domain-specific Models

Deepgram and OpenAI provide speech recognition models that are pre-trained or tuned to identify the words and phrases unique to a specific use case or domain. Deepgram creates our speech recognition models through transfer learning from our highly-performant general models. It is important to test multiple models to see which one meets the accuracy, performance, and scalability needs for your use case.

📘

For more details on Deepgram models see Model Overview.

Deepgram provides:

  • General
  • Phone calls
  • Meetings
  • Voicemail
  • Conversational AI
  • Finance
  • Video
  • Whisper Cloud
  • Custom Models

OpenAI provides:

  • Whisper-1 (Whisper v2-large)