Transcribe Pre-recorded Audio

Last updated 08/03/2021

Deepgram gives you streamlined access to automatic transcription from Deepgram's off-the-shelf and trained speech recognition models. This product is very fast, can understand nearly every audio format available, and is customizable.

In this guide, you'll learn how to automatically transcribe pre-recorded audio from both a local file and a remote file using Deepgram's off-the-shelf general purpose AI model. We will also show you how to use some basic parameters to customize your transcript.

The examples in this guide use cURL rather than Deepgram SDKs. To learn how to transcribe pre-recorded audio with Deepgram using our SDKs, see Quickstart: Get Started with Pre-recorded Audio.

Prerequisites

Before you can use Deepgram products, you'll need to create a Deepgram account. Signup is free and includes:

  • $150 in credit, which gives you access to:
    • all base models
    • pre-recorded and streaming functionality
    • all features
<AccordionPanel title="Create a Deepgram API Key" anchor="create-api-key">

To access Deepgram’s API, you'll need to [create a Deepgram API Key](/getting-started/create-api-key). Make note of your API Key; you will need it later.

</AccordionPanel>

Transcribe Audio

Deepgram’s API allows you to process both local files and remote files that are publicly accessible.

If you don’t have an audio file of your own, you can use our sample WAV file.

Depending on the location of the audio you would like to transcribe, run one of the following sample cURL commands. Be sure to replace the placeholder YOUR_DEEPGRAM_API_KEY with the Deepgram API Key you created earlier in this tutorial.

For easy-to-read output, we highly recommend that you run these requests through jq.

Transcribe a Local File

To transcribe audio from a file on your computer, run the following in a terminal or your favorite API client:

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen'

Transcribe a Remote File

To transcribe audio from a publicly-accessible remote file (e.g., hosted in AWS S3 or another server), run the following in a terminal or your favorite API client:

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{"url":"https://static.deepgram.com/examples/interview_speech-analytics.wav"}' \
  --url 'https://api.deepgram.com/v1/listen'

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response:

{
  "metadata": {
    "transaction_key": "iqIt",
    "request_id": "m9IBwpEGuLOd5fDVUcwRoojQeVDIc4wU",
    "sha256": "6b198da276e1108a87e15674ba5e68f4893f85aa584ea96c2b0b5fe32e756bd9",
    "created": "2020-05-01T18:19:17.153Z",
    "duration": 2705.3577,
    "channels": 1
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "hey natalie just joined",
            "confidence": 0.87023026,
            "words": [
              {
                "word": "hey",
                "start": 35.61904,
                "end": 35.77853,
                "confidence": 0.54808563
              },
              {
                "word": "natalie",
                "start": 35.77853,
                "end": 36.27853,
                "confidence": 0.41259128
              },
              ...
              }
            ]
          }
        ]
      }
    ]
  }
}

In this default response, we see:

  • transcript: the transcript for the audio segment being processed.
  • confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

By default, Deepgram applies its general AI model, which is a good, general purpose model for everyday situations.

Customize Transcripts

To customize transcripts, you can add a variety of parameters to the query string.

For example, if you would like to use the phonecall AI model rather than the general AI model, you can add ?model=phonecall to the URL in the previous examples:

"https://api.deepgram.com/v1/listen?model=phonecall"

Similarly, if you'd like to get a transcript with punctuation and capitalization, you can add ?punctuate=true to the URL in the previous examples:

"https://api.deepgram.com/v1/listen?punctuate=true"

To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.