Getting Started | Deepgram's Docs

Try this feature out in our API Playground.

This guide will walk you through how to transcribe pre-recorded audio with the Deepgram API. We provide two scenarios to try: transcribe a remote file and transcribe a local file.

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

CURL

Next, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY and then run the following examples in a terminal or your favorite API client.

If you run the “Local file CURL Example,” be sure to change @youraudio.wav to the path/filename of an audio file on your computer. (Read more about supported audio formats here).

Remote File CURL Example

1 curl \
2   --request POST \
3   --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
4   --header 'Content-Type: application/json' \
5   --data '{"url":"https://dpgr.am/spacewalk.wav"}' \
6   --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true'

Local File CURL Example

1 curl \
2   --request POST \
3   --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
4   --header 'Content-Type: audio/wav' \
5   --data-binary @youraudio.wav \
6   --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true'

The above example includes the parameter model=nova-3, which tells the API to use Deepgram’s most latest model. Removing this parameter will result in the API using the default model, which is currently model=base.

It also includes Deepgram’s Smart Formatting feature, smart_format=true. This will format currency amounts, phone numbers, email addresses, and more for enhanced transcript readability.

SDKs

To transcribe pre-recorded audio using one of Deepgram’s SDKs, follow these steps.

Install the SDK

Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.

$ # Install the Deepgram JS SDK
> # https://github.com/deepgram/deepgram-js-sdk
> 
> npm install @deepgram/sdk

Add Dependencies

$ # Install dotenv to protect your api key
> 
> npm install dotenv

Transcribe a Remote File

This example shows how to analyze a remote audio file (a URL that hosts your audio file) using Deepgram’s SDKs. In your terminal, create a new file in your project’s location, and populate it with the code.

1 // index.js (node example)
2 
3 const { createClient } = require("@deepgram/sdk");
4 require("dotenv").config();
5 
6 const transcribeUrl = async () => {
7   // STEP 1: Create a Deepgram client using the API key
8   const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
9 
10   // STEP 2: Call the transcribeUrl method with the audio payload and options
11   const { result, error } = await deepgram.listen.prerecorded.transcribeUrl(
12     {
13       url: "https://dpgr.am/spacewalk.wav",
14     },
15     // STEP 3: Configure Deepgram options for audio analysis
16     {
17       model: "nova-3",
18       smart_format: true,
19     }
20   );
21 
22   if (error) throw error;
23   // STEP 4: Print the results
24   if (!error) console.dir(result, { depth: null });
25 };
26 
27 transcribeUrl();

Transcribe a Local File

This example shows how to analyze a local audio file (an audio file on your computer) using Deepgram’s SDKs. In your terminal, create a new file in your project’s location, and populate it with the code. (Be sure to replace the audio filename with a path/filename of an audio file on your computer.)

1 // index.js (node example)
2 
3 const { createClient } = require("@deepgram/sdk");
4 const fs = require("fs");
5 
6 const transcribeFile = async () => {
7   // STEP 1: Create a Deepgram client using the API key
8   const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
9 
10   // STEP 2: Call the transcribeFile method with the audio payload and options
11   const { result, error } = await deepgram.listen.prerecorded.transcribeFile(
12     // path to the audio file
13     fs.readFileSync("spacewalk.mp3"),
14     // STEP 3: Configure Deepgram options for audio analysis
15     {
16       model: "nova-3",
17       smart_format: true,
18     }
19   );
20 
21   if (error) throw error;
22   // STEP 4: Print the results
23   if (!error) console.dir(result, { depth: null });
24 };
25 
26 transcribeFile();

Non-SDK Code Examples

If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs.

Results

In order to see the results from Deepgram, you must run the application. Run your application from the terminal. Your transcripts will appear in your shell.

$ # Run your application using the file you created in the previous step
> # Example: node index.js
> 
> node YOUR_PROJECT_NAME.js

Deepgram does not store transcripts, so the Deepgram API response is the only opportunity to retrieve the transcript. Make sure to save output or return transcriptions to a callback URL for custom processing.

Analyze the Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response:

JSON

1 {
2   "metadata": {
3     "transaction_key": "deprecated",
4     "request_id": "2479c8c8-8185-40ac-9ac6-f0874419f793",
5     "sha256": "154e291ecfa8be6ab8343560bcc109008fa7853eb5372533e8efdefc9b504c33",
6     "created": "2024-02-06T19:56:16.180Z",
7     "duration": 25.933313,
8     "channels": 1,
9     "models": [
10       "30089e05-99d1-4376-b32e-c263170674af"
11     ],
12     "model_info": {
13       "30089e05-99d1-4376-b32e-c263170674af": {
14         "name": "2-general-nova",
15         "version": "2024-01-09.29447",
16         "arch": "nova-3"
17       }
18     }
19   },
20   "results": {
21     "channels": [
22       {
23         "alternatives": [
24           {
25             "transcript": "Yeah. As as much as, it's worth celebrating, the first, spacewalk, with an all female team, I think many of us are looking forward to it just being normal. And, I think if it signifies anything, It is, to honor the the women who came before us who, were skilled and qualified, and didn't get the the same opportunities that we have today.",
26             "confidence": 0.99902344,
27             "words": [
28               {
29                 "word": "yeah",
30                 "start": 0.08,
31                 "end": 0.32,
32                 "confidence": 0.9975586,
33                 "punctuated_word": "Yeah."
34               },
35               {
36                 "word": "as",
37                 "start": 0.32,
38                 "end": 0.79999995,
39                 "confidence": 0.9921875,
40                 "punctuated_word": "As"
41               },
42               {
43                 "word": "as",
44                 "start": 0.79999995,
45                 "end": 1.04,
46                 "confidence": 0.96777344,
47                 "punctuated_word": "as"
48               },
49               {
50                 "word": "much",
51                 "start": 1.04,
52                 "end": 1.28,
53                 "confidence": 1,
54                 "punctuated_word": "much"
55               },
56               {
57                 "word": "as",
58                 "start": 1.28,
59                 "end": 1.5999999,
60                 "confidence": 0.9926758,
61                 "punctuated_word": "as,"
62               },
63               {
64                 "word": "it's",
65                 "start": 2,
66                 "end": 2.24,
67                 "confidence": 1,
68                 "punctuated_word": "it's"
69               },
70               {
71                 "word": "worth",
72                 "start": 2.24,
73                 "end": 2.74,
74                 "confidence": 1,
75                 "punctuated_word": "worth"
76               },
77               {
78                 "word": "celebrating",
79                 "start": 2.8,
80                 "end": 3.3,
81                 "confidence": 0.97143555,
82                 "punctuated_word": "celebrating,"
83               },
84               {
85                 "word": "the",
86                 "start": 4.4,
87                 "end": 4.64,
88                 "confidence": 0.9980469,
89                 "punctuated_word": "the"
90               },
91               {
92                 "word": "first",
93                 "start": 4.64,
94                 "end": 5.04,
95                 "confidence": 0.80200195,
96                 "punctuated_word": "first,"
97               },
98               {
99                 "word": "spacewalk",
100                 "start": 5.2799997,
101                 "end": 5.7799997,
102                 "confidence": 0.9468994,
103                 "punctuated_word": "spacewalk,"
104               },
105               {
106                 "word": "with",
107                 "start": 6.3199997,
108                 "end": 6.56,
109                 "confidence": 1,
110                 "punctuated_word": "with"
111               },
112               {
113                 "word": "an",
114                 "start": 6.56,
115                 "end": 6.72,
116                 "confidence": 0.99902344,
117                 "punctuated_word": "an"
118               },
119               {
120                 "word": "all",
121                 "start": 6.72,
122                 "end": 6.96,
123                 "confidence": 0.9980469,
124                 "punctuated_word": "all"
125               },
126               {
127                 "word": "female",
128                 "start": 6.96,
129                 "end": 7.3599997,
130                 "confidence": 1,
131                 "punctuated_word": "female"
132               },
133               {
134                 "word": "team",
135                 "start": 7.3599997,
136                 "end": 7.8599997,
137                 "confidence": 0.91625977,
138                 "punctuated_word": "team,"
139               },
140               {
141                 "word": "i",
142                 "start": 8.395,
143                 "end": 8.555,
144                 "confidence": 0.94384766,
145                 "punctuated_word": "I"
146               },
147               {
148                 "word": "think",
149                 "start": 8.555,
150                 "end": 8.875,
151                 "confidence": 0.99902344,
152                 "punctuated_word": "think"
153               },
154               {
155                 "word": "many",
156                 "start": 8.875,
157                 "end": 9.115001,
158                 "confidence": 0.9838867,
159                 "punctuated_word": "many"
160               },
161               {
162                 "word": "of",
163                 "start": 9.115001,
164                 "end": 9.3550005,
165                 "confidence": 1,
166                 "punctuated_word": "of"
167               },
168               {
169                 "word": "us",
170                 "start": 9.3550005,
171                 "end": 9.8550005,
172                 "confidence": 1,
173                 "punctuated_word": "us"
174               },
175               {
176                 "word": "are",
177                 "start": 9.995001,
178                 "end": 10.235001,
179                 "confidence": 0.9633789,
180                 "punctuated_word": "are"
181               },
182               {
183                 "word": "looking",
184                 "start": 10.235001,
185                 "end": 10.475,
186                 "confidence": 0.9980469,
187                 "punctuated_word": "looking"
188               },
189               {
190                 "word": "forward",
191                 "start": 10.475,
192                 "end": 10.795,
193                 "confidence": 1,
194                 "punctuated_word": "forward"
195               },
196               {
197                 "word": "to",
198                 "start": 10.795,
199                 "end": 10.955,
200                 "confidence": 1,
201                 "punctuated_word": "to"
202               },
203               {
204                 "word": "it",
205                 "start": 10.955,
206                 "end": 11.115001,
207                 "confidence": 0.99902344,
208                 "punctuated_word": "it"
209               },
210               {
211                 "word": "just",
212                 "start": 11.115001,
213                 "end": 11.3550005,
214                 "confidence": 0.9980469,
215                 "punctuated_word": "just"
216               },
217               {
218                 "word": "being",
219                 "start": 11.3550005,
220                 "end": 11.8550005,
221                 "confidence": 0.9980469,
222                 "punctuated_word": "being"
223               },
224               {
225                 "word": "normal",
226                 "start": 11.995001,
227                 "end": 12.495001,
228                 "confidence": 0.98535156,
229                 "punctuated_word": "normal."
230               },
231               {
232                 "word": "and",
233                 "start": 12.715,
234                 "end": 13.115,
235                 "confidence": 0.9555664,
236                 "punctuated_word": "And,"
237               },
238               {
239                 "word": "i",
240                 "start": 13.915001,
241                 "end": 13.995001,
242                 "confidence": 0.99902344,
243                 "punctuated_word": "I"
244               },
245               {
246                 "word": "think",
247                 "start": 13.995001,
248                 "end": 14.235001,
249                 "confidence": 1,
250                 "punctuated_word": "think"
251               },
252               {
253                 "word": "if",
254                 "start": 14.235001,
255                 "end": 14.395,
256                 "confidence": 0.9902344,
257                 "punctuated_word": "if"
258               },
259               {
260                 "word": "it",
261                 "start": 14.395,
262                 "end": 14.555,
263                 "confidence": 0.9892578,
264                 "punctuated_word": "it"
265               },
266               {
267                 "word": "signifies",
268                 "start": 14.555,
269                 "end": 15.055,
270                 "confidence": 1,
271                 "punctuated_word": "signifies"
272               },
273               {
274                 "word": "anything",
275                 "start": 15.115,
276                 "end": 15.615,
277                 "confidence": 0.98217773,
278                 "punctuated_word": "anything,"
279               },
280               {
281                 "word": "it",
282                 "start": 15.82,
283                 "end": 15.98,
284                 "confidence": 0.88671875,
285                 "punctuated_word": "It"
286               },
287               {
288                 "word": "is",
289                 "start": 15.98,
290                 "end": 16.38,
291                 "confidence": 0.9008789,
292                 "punctuated_word": "is,"
293               },
294               {
295                 "word": "to",
296                 "start": 16.779999,
297                 "end": 17.02,
298                 "confidence": 1,
299                 "punctuated_word": "to"
300               },
301               {
302                 "word": "honor",
303                 "start": 17.02,
304                 "end": 17.34,
305                 "confidence": 1,
306                 "punctuated_word": "honor"
307               },
308               {
309                 "word": "the",
310                 "start": 17.34,
311                 "end": 17.58,
312                 "confidence": 1,
313                 "punctuated_word": "the"
314               },
315               {
316                 "word": "the",
317                 "start": 17.58,
318                 "end": 17.74,
319                 "confidence": 0.99316406,
320                 "punctuated_word": "the"
321               },
322               {
323                 "word": "women",
324                 "start": 17.74,
325                 "end": 18.06,
326                 "confidence": 0.93603516,
327                 "punctuated_word": "women"
328               },
329               {
330                 "word": "who",
331                 "start": 18.06,
332                 "end": 18.22,
333                 "confidence": 1,
334                 "punctuated_word": "who"
335               },
336               {
337                 "word": "came",
338                 "start": 18.22,
339                 "end": 18.46,
340                 "confidence": 1,
341                 "punctuated_word": "came"
342               },
343               {
344                 "word": "before",
345                 "start": 18.46,
346                 "end": 18.7,
347                 "confidence": 1,
348                 "punctuated_word": "before"
349               },
350               {
351                 "word": "us",
352                 "start": 18.7,
353                 "end": 19.2,
354                 "confidence": 1,
355                 "punctuated_word": "us"
356               },
357               {
358                 "word": "who",
359                 "start": 19.42,
360                 "end": 19.82,
361                 "confidence": 0.8569336,
362                 "punctuated_word": "who,"
363               },
364               {
365                 "word": "were",
366                 "start": 20.22,
367                 "end": 20.46,
368                 "confidence": 0.97314453,
369                 "punctuated_word": "were"
370               },
371               {
372                 "word": "skilled",
373                 "start": 20.46,
374                 "end": 20.86,
375                 "confidence": 1,
376                 "punctuated_word": "skilled"
377               },
378               {
379                 "word": "and",
380                 "start": 20.86,
381                 "end": 21.18,
382                 "confidence": 0.99609375,
383                 "punctuated_word": "and"
384               },
385               {
386                 "word": "qualified",
387                 "start": 21.18,
388                 "end": 21.68,
389                 "confidence": 0.9848633,
390                 "punctuated_word": "qualified,"
391               },
392               {
393                 "word": "and",
394                 "start": 22.3,
395                 "end": 22.619999,
396                 "confidence": 1,
397                 "punctuated_word": "and"
398               },
399               {
400                 "word": "didn't",
401                 "start": 22.619999,
402                 "end": 22.86,
403                 "confidence": 0.9655762,
404                 "punctuated_word": "didn't"
405               },
406               {
407                 "word": "get",
408                 "start": 22.86,
409                 "end": 23.18,
410                 "confidence": 1,
411                 "punctuated_word": "get"
412               },
413               {
414                 "word": "the",
415                 "start": 23.18,
416                 "end": 23.42,
417                 "confidence": 0.7626953,
418                 "punctuated_word": "the"
419               },
420               {
421                 "word": "the",
422                 "start": 23.42,
423                 "end": 23.66,
424                 "confidence": 0.625,
425                 "punctuated_word": "the"
426               },
427               {
428                 "word": "same",
429                 "start": 23.66,
430                 "end": 23.98,
431                 "confidence": 0.99902344,
432                 "punctuated_word": "same"
433               },
434               {
435                 "word": "opportunities",
436                 "start": 23.98,
437                 "end": 24.46,
438                 "confidence": 1,
439                 "punctuated_word": "opportunities"
440               },
441               {
442                 "word": "that",
443                 "start": 24.46,
444                 "end": 24.619999,
445                 "confidence": 1,
446                 "punctuated_word": "that"
447               },
448               {
449                 "word": "we",
450                 "start": 24.619999,
451                 "end": 24.779999,
452                 "confidence": 1,
453                 "punctuated_word": "we"
454               },
455               {
456                 "word": "have",
457                 "start": 24.779999,
458                 "end": 25.02,
459                 "confidence": 1,
460                 "punctuated_word": "have"
461               },
462               {
463                 "word": "today",
464                 "start": 25.02,
465                 "end": 25.52,
466                 "confidence": 0.97680664,
467                 "punctuated_word": "today."
468               }
469             ],
470             "paragraphs": {
471               "transcript": "\nYeah. As as much as, it's worth celebrating, the first, spacewalk, with an all female team, I think many of us are looking forward to it just being normal. And, I think if it signifies anything, It is, to honor the the women who came before us who, were skilled and qualified, and didn't get the the same opportunities that we have today.",
472               "paragraphs": [
473                 {
474                   "sentences": [
475                     {
476                       "text": "Yeah.",
477                       "start": 0.08,
478                       "end": 0.32
479                     },
480                     {
481                       "text": "As as much as, it's worth celebrating, the first, spacewalk, with an all female team, I think many of us are looking forward to it just being normal.",
482                       "start": 0.32,
483                       "end": 12.495001
484                     },
485                     {
486                       "text": "And, I think if it signifies anything, It is, to honor the the women who came before us who, were skilled and qualified, and didn't get the the same opportunities that we have today.",
487                       "start": 12.715,
488                       "end": 25.52
489                     }
490                   ],
491                   "num_words": 63,
492                   "start": 0.08,
493                   "end": 25.52
494                 }
495               ]
496             }
497           }
498         ]
499       }
500     ]
501   }
502 }

In this default response, we see:

transcript: the transcript for the audio segment being processed.
confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

Because we passed the smart_format: true option to the transcription.prerecorded method, each word object also includes its punctuated_word value, which contains the transformed word after punctuation and capitalization are applied.

The transaction_key in the metadata field can be ignored. The result will always be "transaction_key": "deprecated".

Limits

There are a few limits to be aware of when making a pre-recorded speech-to-text request.

File Size

The maximum file size is limited to 2 GB.
For large video files, extract the audio stream and upload only the audio to Deepgram. This reduces the file size significantly.

Rate Limits

Nova, Base, and Enhanced Models:

Maximum of 100 concurrent requests per project.
For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Whisper Model:

Paid plan: 15 concurrent requests.
Pay-as-you-go plan: 5 concurrent requests.

Exceeding these limits will result in a 429: Too Many Requests error.

Maximum Processing Time

Fast Transcription Models (Nova, Base, and Enhanced)

These models offer extremely fast transcription.
Maximum processing time: 10 minutes.

Slower Transcription Model (Whisper)

Whisper transcribes more slowly compared to other models.
Maximum processing time: 20 minutes.

Timeout Policy

If a request exceeds the maximum processing time, it will be canceled.
In such cases, a 504: Gateway Timeout error will be returned.

What’s Next?

Now that you’ve transcribed pre-recorded audio, enhance your knowledge by exploring the following areas.

Read the Feature Guides

Deepgram’s features help you to customize your transcripts.

Language: Learn how to transcribe audio in other languages.
Profanity Filtering and Redaction: Discover how to remove profanity or redact personal information like credit card numbers.
Feature Overview: Review the list of features available for pre-recorded speech-to-text. Then, dive into individual guides for more details.

Explore Use Cases

Learn about the different ways you can use Deepgram products to help you meet your business objectives. Explore Deepgram’s use cases.

Transcribe Streaming Audio

Now that you know how to transcribe pre-recorded audio, check out how you can use Deepgram to transcribe streaming audio in real time. To learn more, see Getting Started with Streaming Audio.