Migrating From OpenAI Whisper to Deepgram
Learn how to migrate from OpenAI Whisper to Deepgram. For developers or practitioners who are using OpenAI Whisper for transcription and are considering or actively moving to Deepgram.
Changing audio transcription services can be a challenging task, even for experienced teams. This guide will give you an overview of the process of migrating your transcription services from OpenAI to Deepgram to help you make the transition as quickly and efficiently as possible.
Getting Started
Before you can use Deepgram, you'll need to create a Deepgram account. Signup is free and includes $200 in free credit and access to all of Deepgram's features!
Before you start, you'll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
Migration Process
During the migration process, you will need to perform the following tasks:
Before Migration | During Migration | After Migration |
---|---|---|
- Identify any upstream dependencies on your transcriptions - Find representative samples of your audio for testing - Get familiar with Deepgram’s API and understand differences from OpenAI - Create an API key - Test your audio - Create a migration plan - Create a rollback plan | - Configure your response parsing to conform to Deepgram's JSON response - Swap over traffic to Deepgram API - Monitor systems | - Testing - Tune downstream processes to Deepgram output |
Differences
Once you’ve selected your model Deepgram provides many features and capabilities to help you transcribe and classify your audio. However, some capabilities and concepts are implemented differently from OpenAI.
Features & Capabilities | Deepgram | OpenAI |
---|---|---|
Audio Files Types | Deepgram supports over 100 different audio formats and encodings some of the most common are: mp3, mp4, mp2, AAC, wav, FLAC, PCM, m4a, Ogg, Opus, WebM, and more! | Limited audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. |
Word Timings | Yes | V3 - Stand alone model only |
Confidence Scoring | Yes | No |
Streaming | Yes | No |
Models Support | Multiple Domain Models | One |
Language Support | Many | Many |
Translation | No | Yes |
Diarization | Yes | No |
Prompting | No | Yes |
Transcription Format Options | json , srt , vtt text | json , text , srt , verbose_json , vtt |
Temperature | No | Yes |
Detailed Description of Differences
Open AI
- OpenAI provide you with a just a single
text
field in the response. - Open AI allows you to use a prompt in your request body to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt.
- Open AI allows you to send a temperature value between 0 to 1 in your request body. A Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- if you run your own version of the Whisper v3 model you can expect to see timestamps/word timings but this feature currently isn't available in the OpenAI Transcribe API.
Deepgram
- Deepgram provides you with a significant number of additional fields in the response that can help you better use your transcription output this includes:
- useful meta data about your request
- an overall transcription confidence score
- individual word timings
- individual word confidence scores
- Deepgram doesn't require a temperature score as our models are highly trained and highly accurate and will return the best possible result without the temperature being defined by the user.
- SRT and VTT formats can be obtained by using our Python or Javascript Captions Package. These can be used as stand alone packages and don't require the Deepgram SDK.
- Deepgram can be used to obtain only text as a transcription format. If you index into the
transcript
JSON field, then you can obtain just the text of the transcription.
Open API Default JSON Response
{
"text": "Yeah, as much as it's worth celebrating the first spacewalk with an all-female team, I think many of us are looking forward to it just being normal. And I think if it signifies anything, it is to honor the women who came before us who were skilled and qualified and didn't get the same opportunities that we have today."
}
Deepgram Default JSON Response
Interim Response
{
"channels": {
"alternatives": [
{
"transcript": "yeah as as much as",
"confidence": 0.9970703,
"words": [
{
"word": "yeah",
"start": 0.0,
"end": 0.32,
"confidence": 0.85375977
},
{
"word": "as",
"start": 0.32,
"end": 0.82,
"confidence": 0.99072266
},
{
"word": "as",
"start": 0.88,
"end": 1.04,
"confidence": 0.9394531
},
{
"word": "much",
"start": 1.04,
"end": 1.28,
"confidence": 0.99316406
},
{
"word": "as",
"start": 1.28,
"end": 1.68,
"confidence": 0.81347656
}
]
}
]
},
"channel_index": [0, 1],
"duration": 1.9999375,
"is_final": false,
"metadata": {
"model_uuid": "aa274f3c-e8b3-456a-ac08-dfd797d45514",
"request_id": "3b851a69-291c-4994-897f-325644e98558"
},
"speech_final": false,
"start": 0.0
}
Final Result
{
"metadata": {
"transaction_key": "deprecated",
"request_id": "3b851a69-291c-4994-897f-325644e98558",
"sha256": "154e291ecfa8be6ab8343560bcc109008fa7853eb5372533e8efdefc9b504c33",
"created": "2023-11-09T01:01:05.068Z",
"duration": 25.933313,
"channels": 1,
"models": [
"aa274f3c-e8b3-456a-ac08-dfd797d45514"
],
"model_info": {
"aa274f3c-e8b3-456a-ac08-dfd797d45514": {
"name": "general-nova",
"version": "2023-07-06.22746",
"arch": "nova"
}
}
},
"results": {
"channels": [
{
"alternatives": [
{
"transcript": "yeah as as much as it's worth celebrating the first space walk with an all female team i think many of us are looking forward to it just being normal and i think if it signifies anything it is to honor the the woman who came before us who were skilled and qualified and didn't get the the same opportunities that we have today",
"confidence": 0.9970703,
"words": [
{
"word": "yeah",
"start": 0.0,
"end": 0.32,
"confidence": 0.85375977
},
{
"word": "as",
"start": 0.32,
"end": 0.82,
"confidence": 0.99072266
},
{
"word": "as",
"start": 0.88,
"end": 1.04,
"confidence": 0.9394531
},
{
"word": "much",
"start": 1.04,
"end": 1.28,
"confidence": 0.99316406
},
{
"word": "as",
"start": 1.28,
"end": 1.68,
"confidence": 0.81347656
},
{
"word": "it's",
"start": 2.0,
"end": 2.32,
"confidence": 0.9992676
},
{
"word": "worth",
"start": 2.32,
"end": 2.72,
"confidence": 1.0
},
{
"word": "celebrating",
"start": 2.72,
"end": 3.22,
"confidence": 0.810791
},
{
"word": "the",
"start": 4.48,
"end": 4.64,
"confidence": 0.99902344
},
{
"word": "first",
"start": 4.64,
"end": 5.14,
"confidence": 0.99902344
},
{
"word": "space",
"start": 5.2,
"end": 5.6,
"confidence": 0.22058105
},
{
"word": "walk",
"start": 5.6,
"end": 6.1,
"confidence": 0.8178711
},
{
"word": "with",
"start": 6.3199997,
"end": 6.56,
"confidence": 0.99658203
},
{
"word": "an",
"start": 6.56,
"end": 6.7999997,
"confidence": 0.99902344
},
{
"word": "all",
"start": 6.7999997,
"end": 6.96,
"confidence": 0.99853516
},
{
"word": "female",
"start": 6.96,
"end": 7.3599997,
"confidence": 1.0
},
{
"word": "team",
"start": 7.3599997,
"end": 7.8599997,
"confidence": 0.74890137
},
{
"word": "i",
"start": 8.395,
"end": 8.555,
"confidence": 0.9941406
},
{
"word": "think",
"start": 8.555,
"end": 8.875,
"confidence": 0.99853516
},
{
"word": "many",
"start": 8.875,
"end": 9.115001,
"confidence": 0.99316406
},
{
"word": "of",
"start": 9.115001,
"end": 9.275001,
"confidence": 1.0
},
{
"word": "us",
"start": 9.275001,
"end": 9.775001,
"confidence": 1.0
},
{
"word": "are",
"start": 9.835,
"end": 10.155001,
"confidence": 0.99365234
},
{
"word": "looking",
"start": 10.155001,
"end": 10.475,
"confidence": 0.99853516
},
{
"word": "forward",
"start": 10.475,
"end": 10.795,
"confidence": 1.0
},
{
"word": "to",
"start": 10.795,
"end": 10.955,
"confidence": 0.9995117
},
{
"word": "it",
"start": 10.955,
"end": 11.115001,
"confidence": 0.9980469
},
{
"word": "just",
"start": 11.115001,
"end": 11.435,
"confidence": 0.9165039
},
{
"word": "being",
"start": 11.435,
"end": 11.915001,
"confidence": 0.9995117
},
{
"word": "normal",
"start": 11.915001,
"end": 12.415001,
"confidence": 0.92944336
},
{
"word": "and",
"start": 12.715,
"end": 13.115,
"confidence": 0.92944336
},
{
"word": "i",
"start": 13.835001,
"end": 13.915001,
"confidence": 0.99902344
},
{
"word": "think",
"start": 13.915001,
"end": 14.155001,
"confidence": 1.0
},
{
"word": "if",
"start": 14.155001,
"end": 14.395,
"confidence": 0.97998047
},
{
"word": "it",
"start": 14.395,
"end": 14.475,
"confidence": 0.99902344
},
{
"word": "signifies",
"start": 14.475,
"end": 14.975,
"confidence": 0.9975586
},
{
"word": "anything",
"start": 15.035,
"end": 15.535,
"confidence": 0.88378906
},
{
"word": "it",
"start": 15.74,
"end": 15.98,
"confidence": 0.57714844
},
{
"word": "is",
"start": 15.98,
"end": 16.3,
"confidence": 0.751709
},
{
"word": "to",
"start": 16.86,
"end": 17.02,
"confidence": 0.99902344
},
{
"word": "honor",
"start": 17.02,
"end": 17.34,
"confidence": 1.0
},
{
"word": "the",
"start": 17.34,
"end": 17.58,
"confidence": 0.9970703
},
{
"word": "the",
"start": 17.58,
"end": 17.74,
"confidence": 0.97021484
},
{
"word": "woman",
"start": 17.74,
"end": 18.06,
"confidence": 0.8251953
},
{
"word": "who",
"start": 18.06,
"end": 18.22,
"confidence": 0.9995117
},
{
"word": "came",
"start": 18.22,
"end": 18.46,
"confidence": 1.0
},
{
"word": "before",
"start": 18.46,
"end": 18.779999,
"confidence": 1.0
},
{
"word": "us",
"start": 18.779999,
"end": 19.279999,
"confidence": 0.9995117
},
{
"word": "who",
"start": 19.42,
"end": 19.92,
"confidence": 0.6894531
},
{
"word": "were",
"start": 20.14,
"end": 20.38,
"confidence": 0.42407227
},
{
"word": "skilled",
"start": 20.38,
"end": 20.88,
"confidence": 0.953125
},
{
"word": "and",
"start": 20.939999,
"end": 21.18,
"confidence": 0.9980469
},
{
"word": "qualified",
"start": 21.18,
"end": 21.68,
"confidence": 0.8417969
},
{
"word": "and",
"start": 22.38,
"end": 22.539999,
"confidence": 0.9995117
},
{
"word": "didn't",
"start": 22.539999,
"end": 22.86,
"confidence": 0.9875488
},
{
"word": "get",
"start": 22.86,
"end": 23.18,
"confidence": 0.9975586
},
{
"word": "the",
"start": 23.18,
"end": 23.5,
"confidence": 0.9399414
},
{
"word": "the",
"start": 23.5,
"end": 23.66,
"confidence": 0.70751953
},
{
"word": "same",
"start": 23.66,
"end": 23.9,
"confidence": 0.99853516
},
{
"word": "opportunities",
"start": 23.9,
"end": 24.4,
"confidence": 0.99853516
},
{
"word": "that",
"start": 24.46,
"end": 24.619999,
"confidence": 0.99902344
},
{
"word": "we",
"start": 24.619999,
"end": 24.779999,
"confidence": 1.0
},
{
"word": "have",
"start": 24.779999,
"end": 25.02,
"confidence": 0.99902344
},
{
"word": "today",
"start": 25.02,
"end": 25.52,
"confidence": 0.8835449
}
]
}
]
}
]
}
}
What to Expect in the JSON Response
The Deepgram response will contain the following fields:
transcript
(string)start_time
(duration)end_time
(duration)word
(string)confidence
(float)
The OpenAI response will contain the following fields:
text
(string)
Use Case or Domain-specific Models
Deepgram and OpenAI provide speech recognition models that are pre-trained or tuned to identify the words and phrases unique to a specific use case or domain. Deepgram creates our speech recognition models through transfer learning from our highly-performant general models. It is important to test multiple models to see which one meets the accuracy, performance, and scalability needs for your use case.
For more details on Deepgram models see Model Overview.
Deepgram provides:
- General
- Phone calls
- Meetings
- Voicemail
- Conversational AI
- Finance
- Video
- Whisper Cloud
- Custom Models
OpenAI provides:
- Whisper-1 (Whisper v2-large)
Updated 4 months ago