Migrating From OpenAI Whisper to Deepgram
Learn how to migrate from OpenAI Whisper to Deepgram. For developers or practitioners who are using OpenAI Whisper for transcription and are considering or actively moving to Deepgram.
Changing audio transcription services can be a challenging task, even for experienced teams. This guide will give you an overview of the process of migrating your transcription services from OpenAI to Deepgram to help you make the transition as quickly and efficiently as possible.
Getting Started
Before you can use Deepgram, you’ll need to create a Deepgram account. Signup is free and includes $200 in free credit and access to all of Deepgram’s features!
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
Migration Process
During the migration process, you will need to perform the following tasks:
Differences
Once you’ve selected your model Deepgram provides many features and capabilities to help you transcribe and classify your audio. However, some capabilities and concepts are implemented differently from OpenAI.
Detailed Description of Differences
Open AI
- OpenAI provide you with a just a single
text
field in the response. - Open AI allows you to use a prompt in your request body to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt.
- Open AI allows you to send a temperature value between 0 to 1 in your request body. A Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- if you run your own version of the Whisper v3 model you can expect to see timestamps/word timings but this feature currently isn’t available in the OpenAI Transcribe API.
Deepgram
-
Deepgram provides you with a significant number of additional fields in the response that can help you better use your transcription output this includes:
- useful meta data about your request
- an overall transcription confidence score
- individual word timings
- individual word confidence scores
-
Deepgram doesn’t require a temperature score as our models are highly trained and highly accurate and will return the best possible result without the temperature being defined by the user.
-
SRT and VTT formats can be obtained by using our Python or Javascript Captions Package. These can be used as stand alone packages and don’t require the Deepgram SDK.
-
Deepgram can be used to obtain only text as a transcription format. If you index into the
transcript
JSON field, then you can obtain just the text of the transcription.
Open API Default JSON Response
Deepgram Default JSON Response
Interim Response
Final Result
What to Expect in the JSON Response
The Deepgram response will contain the following fields:
transcript
(string)start_time
(duration)end_time
(duration)word
(string)confidence
(float)
The OpenAI response will contain the following fields:
text
(string)
Use Case or Domain-specific Models
Deepgram and OpenAI provide speech recognition models that are pre-trained or tuned to identify the words and phrases unique to a specific use case or domain. Deepgram creates our speech recognition models through transfer learning from our highly-performant general models. It is important to test multiple models to see which one meets the accuracy, performance, and scalability needs for your use case.
For more details on Deepgram models see Model Overview.
Deepgram provides:
- General
- Phone calls
- Meetings
- Voicemail
- Conversational AI
- Finance
- Video
- Whisper Cloud
- Custom Models
OpenAI provides:
- Whisper-1 (Whisper v2-large)
What’s Next