Getting Started

An introduction to getting transcription data from pre-recorded audio files.


This guide walks you through transcribing pre-recorded audio with the Deepgram API using cURL or one of Deepgram’s SDKs.

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

cURL

Replace YOUR_DEEPGRAM_API_KEY with your API key and run the following in a terminal or API client.

Remote file

1curl \
2 --request POST \
3 --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
4 --header 'Content-Type: application/json' \
5 --data '{"url":"https://dpgr.am/spacewalk.wav"}' \
6 --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true'

Local file

Replace @youraudio.wav with the path to an audio file on your computer. See Supported Audio Formats for accepted formats.

1curl \
2 --request POST \
3 --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
4 --header 'Content-Type: audio/wav' \
5 --data-binary @youraudio.wav \
6 --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true'

The above examples include model=nova-3, which tells the API to use Deepgram’s latest model. Removing this parameter defaults to model=base.

They also include Deepgram’s Smart Formatting feature (smart_format=true), which formats currency amounts, phone numbers, email addresses, and more for enhanced readability.

SDKs

To transcribe pre-recorded audio using one of Deepgram’s SDKs, follow these steps.

Install the SDK and dependencies

Open your terminal, navigate to your project directory, and install the Deepgram SDK along with any required dependencies.

$# Install the Deepgram JS SDK and dotenv
$# https://github.com/deepgram/deepgram-js-sdk
$
$npm install @deepgram/sdk dotenv

Transcribe a remote file

Create a new file in your project and add the following code to transcribe a remote audio file by URL:

1// index.js (node example)
2
3const { DeepgramClient } = require("@deepgram/sdk");
4require("dotenv").config();
5
6const transcribeUrl = async () => {
7 // STEP 1: Create a Deepgram client using the API key
8 const deepgram = new DeepgramClient({ apiKey: process.env.DEEPGRAM_API_KEY });
9
10 // STEP 2: Call the transcribeUrl method with the audio payload and options
11 // STEP 3: Configure Deepgram options for audio analysis
12 const result = await deepgram.listen.v1.media.transcribeUrl({
13 url: "https://dpgr.am/spacewalk.wav",
14 model: "nova-3",
15 smart_format: true,
16 });
17
18 // STEP 4: Print the results
19 console.dir(result, { depth: null });
20};
21
22transcribeUrl();

To transcribe a local file instead of a remote URL, use the transcribeFile (JavaScript), transcribe_file (Python), TranscribeFile (C#), or FromFile (Go) method. Pass the file’s binary content and the same options. See the Pre-Recorded Audio API reference for details.

Non-SDK code examples

1// index.js (node example)
2
3const { DeepgramClient } = require("@deepgram/sdk");
4const fs = require("fs");
5
6const transcribeFile = async () => {
7 // STEP 1: Create a Deepgram client using the API key
8 const deepgram = new DeepgramClient({ apiKey: process.env.DEEPGRAM_API_KEY });
9
10 // STEP 2: Call the transcribeFile method with the audio payload and options
11 // STEP 3: Configure Deepgram options for audio analysis
12 const result = await deepgram.listen.v1.media.transcribeFile(
13 // path to the audio file
14 fs.createReadStream("spacewalk.mp3"),
15 {
16 model: "nova-3",
17 smart_format: true,
18 }
19 );
20
21 // STEP 4: Print the results
22 console.dir(result, { depth: null });
23};
24
25transcribeFile();

Non-SDK Code Examples

If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs. For language-specific examples without Deepgram’s SDKs, see the code-samples repository. We recommend trying the SDKs first.

Results

Run your application from the terminal. Your transcript appears in your shell.

$node index.js

Deepgram does not store transcripts, so the API response is the only opportunity to retrieve the transcript. Save output or return transcriptions to a callback URL for custom processing.

Analyze the response

When the file finishes processing (often after only a few seconds), you receive a JSON response:

JSON
1{
2 "metadata": {
3 "transaction_key": "deprecated",
4 "request_id": "2479c8c8-8185-40ac-9ac6-f0874419f793",
5 "sha256": "154e291ecfa8be6ab8343560bcc109008fa7853eb5372533e8efdefc9b504c33",
6 "created": "2024-02-06T19:56:16.180Z",
7 "duration": 25.933313,
8 "channels": 1,
9 "models": [
10 "30089e05-99d1-4376-b32e-c263170674af"
11 ],
12 "model_info": {
13 "30089e05-99d1-4376-b32e-c263170674af": {
14 "name": "2-general-nova",
15 "version": "2024-01-09.29447",
16 "arch": "nova-3"
17 }
18 }
19 },
20 "results": {
21 "channels": [
22 {
23 "alternatives": [
24 {
25 "transcript": "Yeah. As as much as, it's worth celebrating, the first, spacewalk, with an all female team, I think many of us are looking forward to it just being normal. And, I think if it signifies anything, It is, to honor the the women who came before us who, were skilled and qualified, and didn't get the the same opportunities that we have today.",
26 "confidence": 0.99902344,
27 "words": [
28 {
29 "word": "yeah",
30 "start": 0.08,
31 "end": 0.32,
32 "confidence": 0.9975586,
33 "punctuated_word": "Yeah."
34 },
35 {
36 "word": "as",
37 "start": 0.32,
38 "end": 0.79999995,
39 "confidence": 0.9921875,
40 "punctuated_word": "As"
41 }
42 ],
43 "paragraphs": {
44 "transcript": "\nYeah. As as much as, it's worth celebrating...",
45 "paragraphs": [
46 {
47 "sentences": [
48 {
49 "text": "Yeah.",
50 "start": 0.08,
51 "end": 0.32
52 }
53 ],
54 "num_words": 63,
55 "start": 0.08,
56 "end": 25.52
57 }
58 ]
59 }
60 }
61 ]
62 }
63 ]
64 }
65}

The response above is truncated for brevity. The full response includes a words entry for every word in the transcript and all sentences in the paragraphs object.

In this response:

  • transcript: the transcript for the audio segment being processed.
  • confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
    • Because we passed the smart_format: true option, each word object also includes its punctuated_word value, which contains the transformed word after punctuation and capitalization are applied.

The transaction_key in the metadata field can be ignored. The result is always "transaction_key": "deprecated".

Limits

  • File size: Maximum 2 GB. For large video files, extract the audio stream first.
  • Rate limits: Up to 100 concurrent requests per project for Nova, Base, and Enhanced models. For full details, see API Rate Limits.
  • Processing time: Requests exceeding 10 minutes (Nova/Base/Enhanced) or 20 minutes (Whisper) return a 504: Gateway Timeout error.

What’s next?

  • Feature overview: Review the full list of features available for pre-recorded speech-to-text.
  • Language: Transcribe audio in other languages.
  • Streaming audio: Transcribe audio in real time.
  • Use cases: Explore ways to use Deepgram products.