Migrating From OpenAI Whisper to Deepgram

Learn how to migrate from OpenAI Whisper to Deepgram. For developers or practitioners who are using OpenAI Whisper for transcription and are considering or actively moving to Deepgram.

Changing audio transcription services can be a challenging task, even for experienced teams. This guide will give you an overview of the process of migrating your transcription services from OpenAI to Deepgram to help you make the transition as quickly and efficiently as possible.

Getting Started

Before you can use Deepgram, you’ll need to create a Deepgram account. Signup is free and includes $200 in free credit and access to all of Deepgram’s features!

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

Migration Process

During the migration process, you will need to perform the following tasks:

Before MigrationDuring MigrationAfter Migration
- Identify any upstream dependencies on your transcriptions - Find representative samples of your audio for testing - Get familiar with Deepgram’s API and understand differences from OpenAI - Create an API key - Test your audio - Create a migration plan - Create a rollback plan- Configure your response parsing to conform to Deepgram’s JSON response - Swap over traffic to Deepgram API - Monitor systems- Testing - Tune downstream processes to Deepgram output

Differences

Once you’ve selected your model Deepgram provides many features and capabilities to help you transcribe and classify your audio. However, some capabilities and concepts are implemented differently from OpenAI.

Features & CapabilitiesDeepgramOpenAI
Audio Files TypesDeepgram supports over 100 different audio formats and encodings some of the most common are: mp3, mp4, mp2, AAC, wav, FLAC, PCM, m4a, Ogg, Opus, WebM, and more!Limited audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
Word TimingsYesV3 - Stand alone model only
Confidence ScoringYesNo
StreamingYesNo
Models SupportMultiple Domain ModelsOne
Language SupportManyMany
TranslationNoYes
DiarizationYesNo
PromptingNoYes
Transcription Format Optionsjson, srt, vtt textjson, text, srt, verbose_json, vtt
TemperatureNoYes

Detailed Description of Differences

Open AI

  • OpenAI provide you with a just a single text field in the response.
  • Open AI allows you to use a prompt in your request body to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt.
  • Open AI allows you to send a temperature value between 0 to 1 in your request body. A Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  • if you run your own version of the Whisper v3 model you can expect to see timestamps/word timings but this feature currently isn’t available in the OpenAI Transcribe API.

Deepgram

  • Deepgram provides you with a significant number of additional fields in the response that can help you better use your transcription output this includes:

    • useful meta data about your request
    • an overall transcription confidence score
    • individual word timings
    • individual word confidence scores
  • Deepgram doesn’t require a temperature score as our models are highly trained and highly accurate and will return the best possible result without the temperature being defined by the user.

  • SRT and VTT formats can be obtained by using our Python or Javascript Captions Package. These can be used as stand alone packages and don’t require the Deepgram SDK.

  • Deepgram can be used to obtain only text as a transcription format. If you index into the transcript JSON field, then you can obtain just the text of the transcription.

Open API Default JSON Response

JSON
1{
2 "text": "Yeah, as much as it's worth celebrating the first spacewalk with an all-female team, I think many of us are looking forward to it just being normal. And I think if it signifies anything, it is to honor the women who came before us who were skilled and qualified and didn't get the same opportunities that we have today."
3}

Deepgram Default JSON Response

Interim Response
JSON
1{
2 "channels": {
3 "alternatives": [
4 {
5 "transcript": "yeah as as much as",
6 "confidence": 0.9970703,
7 "words": [
8 {
9 "word": "yeah",
10 "start": 0.0,
11 "end": 0.32,
12 "confidence": 0.85375977
13 },
14 {
15 "word": "as",
16 "start": 0.32,
17 "end": 0.82,
18 "confidence": 0.99072266
19 },
20 {
21 "word": "as",
22 "start": 0.88,
23 "end": 1.04,
24 "confidence": 0.9394531
25 },
26 {
27 "word": "much",
28 "start": 1.04,
29 "end": 1.28,
30 "confidence": 0.99316406
31 },
32 {
33 "word": "as",
34 "start": 1.28,
35 "end": 1.68,
36 "confidence": 0.81347656
37 }
38 ]
39 }
40 ]
41 },
42 "channel_index": [0, 1],
43 "duration": 1.9999375,
44 "is_final": false,
45 "metadata": {
46 "model_uuid": "aa274f3c-e8b3-456a-ac08-dfd797d45514",
47 "request_id": "3b851a69-291c-4994-897f-325644e98558"
48 },
49 "speech_final": false,
50 "start": 0.0
51}
Final Result
JSON
1{
2 "metadata": {
3 "transaction_key": "deprecated",
4 "request_id": "3b851a69-291c-4994-897f-325644e98558",
5 "sha256": "154e291ecfa8be6ab8343560bcc109008fa7853eb5372533e8efdefc9b504c33",
6 "created": "2023-11-09T01:01:05.068Z",
7 "duration": 25.933313,
8 "channels": 1,
9 "models": [
10 "aa274f3c-e8b3-456a-ac08-dfd797d45514"
11 ],
12 "model_info": {
13 "aa274f3c-e8b3-456a-ac08-dfd797d45514": {
14 "name": "general-nova",
15 "version": "2023-07-06.22746",
16 "arch": "nova"
17 }
18 }
19 },
20 "results": {
21 "channels": [
22 {
23 "alternatives": [
24 {
25 "transcript": "yeah as as much as it's worth celebrating the first space walk with an all female team i think many of us are looking forward to it just being normal and i think if it signifies anything it is to honor the the woman who came before us who were skilled and qualified and didn't get the the same opportunities that we have today",
26 "confidence": 0.9970703,
27 "words": [
28 {
29 "word": "yeah",
30 "start": 0.0,
31 "end": 0.32,
32 "confidence": 0.85375977
33 },
34 {
35 "word": "as",
36 "start": 0.32,
37 "end": 0.82,
38 "confidence": 0.99072266
39 },
40 {
41 "word": "as",
42 "start": 0.88,
43 "end": 1.04,
44 "confidence": 0.9394531
45 },
46 {
47 "word": "much",
48 "start": 1.04,
49 "end": 1.28,
50 "confidence": 0.99316406
51 },
52 {
53 "word": "as",
54 "start": 1.28,
55 "end": 1.68,
56 "confidence": 0.81347656
57 },
58 {
59 "word": "it's",
60 "start": 2.0,
61 "end": 2.32,
62 "confidence": 0.9992676
63 },
64 {
65 "word": "worth",
66 "start": 2.32,
67 "end": 2.72,
68 "confidence": 1.0
69 },
70 {
71 "word": "celebrating",
72 "start": 2.72,
73 "end": 3.22,
74 "confidence": 0.810791
75 },
76 {
77 "word": "the",
78 "start": 4.48,
79 "end": 4.64,
80 "confidence": 0.99902344
81 },
82 {
83 "word": "first",
84 "start": 4.64,
85 "end": 5.14,
86 "confidence": 0.99902344
87 },
88 {
89 "word": "space",
90 "start": 5.2,
91 "end": 5.6,
92 "confidence": 0.22058105
93 },
94 {
95 "word": "walk",
96 "start": 5.6,
97 "end": 6.1,
98 "confidence": 0.8178711
99 },
100 {
101 "word": "with",
102 "start": 6.3199997,
103 "end": 6.56,
104 "confidence": 0.99658203
105 },
106 {
107 "word": "an",
108 "start": 6.56,
109 "end": 6.7999997,
110 "confidence": 0.99902344
111 },
112 {
113 "word": "all",
114 "start": 6.7999997,
115 "end": 6.96,
116 "confidence": 0.99853516
117 },
118 {
119 "word": "female",
120 "start": 6.96,
121 "end": 7.3599997,
122 "confidence": 1.0
123 },
124 {
125 "word": "team",
126 "start": 7.3599997,
127 "end": 7.8599997,
128 "confidence": 0.74890137
129 },
130 {
131 "word": "i",
132 "start": 8.395,
133 "end": 8.555,
134 "confidence": 0.9941406
135 },
136 {
137 "word": "think",
138 "start": 8.555,
139 "end": 8.875,
140 "confidence": 0.99853516
141 },
142 {
143 "word": "many",
144 "start": 8.875,
145 "end": 9.115001,
146 "confidence": 0.99316406
147 },
148 {
149 "word": "of",
150 "start": 9.115001,
151 "end": 9.275001,
152 "confidence": 1.0
153 },
154 {
155 "word": "us",
156 "start": 9.275001,
157 "end": 9.775001,
158 "confidence": 1.0
159 },
160 {
161 "word": "are",
162 "start": 9.835,
163 "end": 10.155001,
164 "confidence": 0.99365234
165 },
166 {
167 "word": "looking",
168 "start": 10.155001,
169 "end": 10.475,
170 "confidence": 0.99853516
171 },
172 {
173 "word": "forward",
174 "start": 10.475,
175 "end": 10.795,
176 "confidence": 1.0
177 },
178 {
179 "word": "to",
180 "start": 10.795,
181 "end": 10.955,
182 "confidence": 0.9995117
183 },
184 {
185 "word": "it",
186 "start": 10.955,
187 "end": 11.115001,
188 "confidence": 0.9980469
189 },
190 {
191 "word": "just",
192 "start": 11.115001,
193 "end": 11.435,
194 "confidence": 0.9165039
195 },
196 {
197 "word": "being",
198 "start": 11.435,
199 "end": 11.915001,
200 "confidence": 0.9995117
201 },
202 {
203 "word": "normal",
204 "start": 11.915001,
205 "end": 12.415001,
206 "confidence": 0.92944336
207 },
208 {
209 "word": "and",
210 "start": 12.715,
211 "end": 13.115,
212 "confidence": 0.92944336
213 },
214 {
215 "word": "i",
216 "start": 13.835001,
217 "end": 13.915001,
218 "confidence": 0.99902344
219 },
220 {
221 "word": "think",
222 "start": 13.915001,
223 "end": 14.155001,
224 "confidence": 1.0
225 },
226 {
227 "word": "if",
228 "start": 14.155001,
229 "end": 14.395,
230 "confidence": 0.97998047
231 },
232 {
233 "word": "it",
234 "start": 14.395,
235 "end": 14.475,
236 "confidence": 0.99902344
237 },
238 {
239 "word": "signifies",
240 "start": 14.475,
241 "end": 14.975,
242 "confidence": 0.9975586
243 },
244 {
245 "word": "anything",
246 "start": 15.035,
247 "end": 15.535,
248 "confidence": 0.88378906
249 },
250 {
251 "word": "it",
252 "start": 15.74,
253 "end": 15.98,
254 "confidence": 0.57714844
255 },
256 {
257 "word": "is",
258 "start": 15.98,
259 "end": 16.3,
260 "confidence": 0.751709
261 },
262 {
263 "word": "to",
264 "start": 16.86,
265 "end": 17.02,
266 "confidence": 0.99902344
267 },
268 {
269 "word": "honor",
270 "start": 17.02,
271 "end": 17.34,
272 "confidence": 1.0
273 },
274 {
275 "word": "the",
276 "start": 17.34,
277 "end": 17.58,
278 "confidence": 0.9970703
279 },
280 {
281 "word": "the",
282 "start": 17.58,
283 "end": 17.74,
284 "confidence": 0.97021484
285 },
286 {
287 "word": "woman",
288 "start": 17.74,
289 "end": 18.06,
290 "confidence": 0.8251953
291 },
292 {
293 "word": "who",
294 "start": 18.06,
295 "end": 18.22,
296 "confidence": 0.9995117
297 },
298 {
299 "word": "came",
300 "start": 18.22,
301 "end": 18.46,
302 "confidence": 1.0
303 },
304 {
305 "word": "before",
306 "start": 18.46,
307 "end": 18.779999,
308 "confidence": 1.0
309 },
310 {
311 "word": "us",
312 "start": 18.779999,
313 "end": 19.279999,
314 "confidence": 0.9995117
315 },
316 {
317 "word": "who",
318 "start": 19.42,
319 "end": 19.92,
320 "confidence": 0.6894531
321 },
322 {
323 "word": "were",
324 "start": 20.14,
325 "end": 20.38,
326 "confidence": 0.42407227
327 },
328 {
329 "word": "skilled",
330 "start": 20.38,
331 "end": 20.88,
332 "confidence": 0.953125
333 },
334 {
335 "word": "and",
336 "start": 20.939999,
337 "end": 21.18,
338 "confidence": 0.9980469
339 },
340 {
341 "word": "qualified",
342 "start": 21.18,
343 "end": 21.68,
344 "confidence": 0.8417969
345 },
346 {
347 "word": "and",
348 "start": 22.38,
349 "end": 22.539999,
350 "confidence": 0.9995117
351 },
352 {
353 "word": "didn't",
354 "start": 22.539999,
355 "end": 22.86,
356 "confidence": 0.9875488
357 },
358 {
359 "word": "get",
360 "start": 22.86,
361 "end": 23.18,
362 "confidence": 0.9975586
363 },
364 {
365 "word": "the",
366 "start": 23.18,
367 "end": 23.5,
368 "confidence": 0.9399414
369 },
370 {
371 "word": "the",
372 "start": 23.5,
373 "end": 23.66,
374 "confidence": 0.70751953
375 },
376 {
377 "word": "same",
378 "start": 23.66,
379 "end": 23.9,
380 "confidence": 0.99853516
381 },
382 {
383 "word": "opportunities",
384 "start": 23.9,
385 "end": 24.4,
386 "confidence": 0.99853516
387 },
388 {
389 "word": "that",
390 "start": 24.46,
391 "end": 24.619999,
392 "confidence": 0.99902344
393 },
394 {
395 "word": "we",
396 "start": 24.619999,
397 "end": 24.779999,
398 "confidence": 1.0
399 },
400 {
401 "word": "have",
402 "start": 24.779999,
403 "end": 25.02,
404 "confidence": 0.99902344
405 },
406 {
407 "word": "today",
408 "start": 25.02,
409 "end": 25.52,
410 "confidence": 0.8835449
411 }
412 ]
413 }
414 ]
415 }
416 ]
417 }
418}
What to Expect in the JSON Response

The Deepgram response will contain the following fields:

  • transcript(string)
  • start_time (duration)
  • end_time (duration)
  • word (string)
  • confidence (float)

The OpenAI response will contain the following fields:

  • text(string)

Use Case or Domain-specific Models

Deepgram and OpenAI provide speech recognition models that are pre-trained or tuned to identify the words and phrases unique to a specific use case or domain. Deepgram creates our speech recognition models through transfer learning from our highly-performant general models. It is important to test multiple models to see which one meets the accuracy, performance, and scalability needs for your use case.

For more details on Deepgram models see Model Overview.

Deepgram provides:

  • General
  • Phone calls
  • Meetings
  • Voicemail
  • Conversational AI
  • Finance
  • Video
  • Whisper Cloud
  • Custom Models

OpenAI provides:

  • Whisper-1 (Whisper v2-large)

What’s Next

Built with