Getting Started

An introduction to getting transcription data from pre-recorded audio files.


This guide walks you through transcribing pre-recorded audio with the Deepgram API using cURL or one of Deepgram’s SDKs.

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

cURL

Replace YOUR_DEEPGRAM_API_KEY with your API key and run the following in a terminal or API client.

Remote file

1curl \
2 --request POST \
3 --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
4 --header 'Content-Type: application/json' \
5 --data '{"url":"https://dpgr.am/spacewalk.wav"}' \
6 --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true'

Local file

Replace @youraudio.wav with the path to an audio file on your computer. See Supported Audio Formats for accepted formats.

1curl \
2 --request POST \
3 --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
4 --header 'Content-Type: audio/wav' \
5 --data-binary @youraudio.wav \
6 --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true'

The above examples include model=nova-3, which tells the API to use Deepgram’s latest model. Removing this parameter defaults to model=base.

They also include Deepgram’s Smart Formatting feature (smart_format=true), which formats currency amounts, phone numbers, email addresses, and more for enhanced readability.

SDKs

To transcribe pre-recorded audio using one of Deepgram’s SDKs, follow these steps.

Install the SDK and dependencies

Open your terminal, navigate to your project directory, and install the Deepgram SDK along with any required dependencies.

$# Install the Deepgram JS SDK and dotenv
$# https://github.com/deepgram/deepgram-js-sdk
$
$npm install @deepgram/sdk dotenv

Transcribe a remote file

Create a new file in your project and add the following code to transcribe a remote audio file by URL:

1// index.js (node example)
2
3const { createClient } = require("@deepgram/sdk");
4require("dotenv").config();
5
6const transcribeUrl = async () => {
7 // STEP 1: Create a Deepgram client using the API key
8 const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
9
10 // STEP 2: Call the transcribeUrl method with the audio payload and options
11 const { result, error } = await deepgram.listen.prerecorded.transcribeUrl(
12 {
13 url: "https://dpgr.am/spacewalk.wav",
14 },
15 // STEP 3: Configure Deepgram options for audio analysis
16 {
17 model: "nova-3",
18 smart_format: true,
19 }
20 );
21
22 if (error) throw error;
23 // STEP 4: Print the results
24 if (!error) console.dir(result, { depth: null });
25};
26
27transcribeUrl();

To transcribe a local file instead of a remote URL, use the transcribeFile (JavaScript), transcribe_file (Python), TranscribeFile (C#), or FromFile (Go) method. Pass the file’s binary content and the same options. See the Pre-Recorded Audio API reference for details.

Non-SDK code examples

For language-specific examples without Deepgram’s SDKs, see the code-samples repository. We recommend trying the SDKs first.

Results

Run your application from the terminal. Your transcript appears in your shell.

$node index.js

Deepgram does not store transcripts, so the API response is the only opportunity to retrieve the transcript. Save output or return transcriptions to a callback URL for custom processing.

Analyze the response

When the file finishes processing (often after only a few seconds), you receive a JSON response:

JSON
1{
2 "metadata": {
3 "transaction_key": "deprecated",
4 "request_id": "2479c8c8-8185-40ac-9ac6-f0874419f793",
5 "sha256": "154e291ecfa8be6ab8343560bcc109008fa7853eb5372533e8efdefc9b504c33",
6 "created": "2024-02-06T19:56:16.180Z",
7 "duration": 25.933313,
8 "channels": 1,
9 "models": [
10 "30089e05-99d1-4376-b32e-c263170674af"
11 ],
12 "model_info": {
13 "30089e05-99d1-4376-b32e-c263170674af": {
14 "name": "2-general-nova",
15 "version": "2024-01-09.29447",
16 "arch": "nova-3"
17 }
18 }
19 },
20 "results": {
21 "channels": [
22 {
23 "alternatives": [
24 {
25 "transcript": "Yeah. As as much as, it's worth celebrating, the first, spacewalk, with an all female team, I think many of us are looking forward to it just being normal. And, I think if it signifies anything, It is, to honor the the women who came before us who, were skilled and qualified, and didn't get the the same opportunities that we have today.",
26 "confidence": 0.99902344,
27 "words": [
28 {
29 "word": "yeah",
30 "start": 0.08,
31 "end": 0.32,
32 "confidence": 0.9975586,
33 "punctuated_word": "Yeah."
34 },
35 {
36 "word": "as",
37 "start": 0.32,
38 "end": 0.79999995,
39 "confidence": 0.9921875,
40 "punctuated_word": "As"
41 }
42 ],
43 "paragraphs": {
44 "transcript": "\nYeah. As as much as, it's worth celebrating...",
45 "paragraphs": [
46 {
47 "sentences": [
48 {
49 "text": "Yeah.",
50 "start": 0.08,
51 "end": 0.32
52 }
53 ],
54 "num_words": 63,
55 "start": 0.08,
56 "end": 25.52
57 }
58 ]
59 }
60 }
61 ]
62 }
63 ]
64 }
65}

The response above is truncated for brevity. The full response includes a words entry for every word in the transcript and all sentences in the paragraphs object.

In this response:

  • transcript: the transcript for the audio segment being processed.
  • confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
    • Because we passed the smart_format: true option, each word object also includes its punctuated_word value, which contains the transformed word after punctuation and capitalization are applied.

The transaction_key in the metadata field can be ignored. The result is always "transaction_key": "deprecated".

Limits

  • File size: Maximum 2 GB. For large video files, extract the audio stream first.
  • Rate limits: Up to 100 concurrent requests per project for Nova, Base, and Enhanced models. For full details, see API Rate Limits.
  • Processing time: Requests exceeding 10 minutes (Nova/Base/Enhanced) or 20 minutes (Whisper) return a 504: Gateway Timeout error.

What’s next?

  • Feature overview: Review the full list of features available for pre-recorded speech-to-text.
  • Language: Transcribe audio in other languages.
  • Streaming audio: Transcribe audio in real time.
  • Use cases: Explore ways to use Deepgram products.