Transcribe Pre-recorded Audio

Last updated 05/06/2021

Deepgram gives you streamlined access to automatic transcription from Deepgram's off-the-shelf and trained speech recognition models. This product is very fast, can understand nearly every audio format available, and is customizable.

In this guide, you'll learn how to automatically transcribe pre-recorded audio from both a local file and a remote file using Deepgram's off-the-shelf general purpose AI model. We will also show you how to use some basic parameters to customize your transcript.

Prerequisites

Before you can use Deepgram products, you'll need to create a Deepgram account.

Your account comes preloaded with:

  • 20 audio hours per month of Automatic Speech Recognition
  • Access to 3 of Deepgram’s off-the-shelf Beginner models

Transcribe Audio

Deepgram’s API allows you to process both local files and remote files that are publicly accessible.

If you don’t have an audio file of your own, you can use our sample WAV file.

Depending on the location of the audio you would like to transcribe, run one of the following sample curl commands. Be sure to swap the placeholder username and password with the email address you used to create your Deepgram account and your Deepgram password.

For easy-to-read output, we highly recommend that you run these requests through jq.

Transcribe a Local File

To transcribe audio from a file on your computer, run the following in a terminal or your favorite API client:

curl \
  -X POST \
  -u USERNAME:PASSWORD\
  -H "Content-Type: audio/wav" \
  --data-binary @myaudio.wav \
  "https://brain.deepgram.com/v2/listen"

Transcribe a Remote File

To transcribe audio from a publicly-accessible remote file (e.g., hosted in AWS S3 or another server), run the following in a terminal or your favorite API client:

curl \
  -X POST \
  -u USERNAME:PASSWORD\
  -H "Content-Type: application/json" \
  -d '{"url": "https://static.deepgram.com/examples/interview_speech-analytics.wav"}' \
  "https://brain.deepgram.com/v2/listen"

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response:

{
  "metadata": {
    "transaction_key": "iqIt",
    "request_id": "m9IBwpEGuLOd5fDVUcwRoojQeVDIc4wU",
    "sha256": "6b198da276e1108a87e15674ba5e68f4893f85aa584ea96c2b0b5fe32e756bd9",
    "created": "2020-05-01T18:19:17.153Z",
    "duration": 2705.3577,
    "channels": 1
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "hey natalie just joined",
            "confidence": 0.87023026,
            "words": [
              {
                "word": "hey",
                "start": 35.61904,
                "end": 35.77853,
                "confidence": 0.54808563
              },
              {
                "word": "natalie",
                "start": 35.77853,
                "end": 36.27853,
                "confidence": 0.41259128
              },
              ...
              }
            ]
          }
        ]
      }
    ]
  }
}

In this default response, we see:

  • transcript: the transcript for the audio segment being processed.
  • confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

By default, Deepgram applies its general AI model, which is a good, general purpose model for everyday situations.

Customize Transcripts

To customize transcripts, you can add a variety of parameters to the query string.

For example, if you would like to use the phonecall AI model rather than the general AI model, you can add ?model=phonecall to the URL in the previous examples:

"https://brain.deepgram.com/v2/listen?model=phonecall"

Similarly, if you'd like to get a transcript with punctuation and capitalization, you can add ?punctuate=true to the URL in the previous examples:

"https://brain.deepgram.com/v2/listen?punctuate=true"

To learn more about the customization possible with Deepgram's API, check out the Speech Recognition API Reference.