Quickstart: Get Started with Pre-recorded Audio

Last updated 08/03/2021

In this quickstart, you'll learn how to automatically transcribe pre-recorded audio using Deepgram's SDKs.

Not a developer? Check out the Deepgram Console for a no-code way to get started with Deepgram's API.

The examples in this guide use Deepgram SDKs. To learn how to transcribe pre-recorded audio with Deepgram using example cURL, see Transcribe Pre-recorded Audio.

Before You Begin

Before you run the code, you'll need to do a few things:

Before you can use Deepgram products, you'll need to create a Deepgram account. Signup is free and includes:

  • $150 in credit, which gives you access to:
    • all base models
    • pre-recorded and streaming functionality
    • all features

To access Deepgram’s API, you'll need to create a Deepgram API Key. Make note of your API Key; you will need it later.

We provide sample scripts in Python and Node.js and assume you have already configured either a Python or Node development environment.

If you get stuck at any point, help is just a click away! Contact Support.

Transcribe Audio

Once you have your API Key, it's time to transcribe audio! The instructions below will guide you through the process of creating a sample application, installing the Deepgram SDK, configuring code with your own Deepgram API Key and pre-recorded audio to transcribe, and finally, building and running the application.

  1. Create a Sample Application

    Open your terminal, navigate to the location on your drive where you want to create your project, and initialize a new application:

    # Initialize a new application
    npm init
    
  2. Choose an Audio File

    Download our sample audio file, or record your own using your device’s microphone. Make sure downloaded files are in your project directory.

  3. Install the SDK

    In your terminal, install the Deepgram SDK:

    # Install the Deepgram Python SDK
    # https://github.com/deepgram/python-sdk
    pip install deepgram-sdk
    
    # Install the Deepgram Node.js SDK
    # https://github.com/deepgram/node-sdk
    npm install @deepgram/sdk
    
  4. Write the Code

    In your terminal, create a new file and populate it with code.

    Create a new file called deepgram_test.py in your project's location. Populate this file:

    from deepgram import Deepgram
    import asyncio, json
    
    # Your Deepgram API Key
    DEEPGRAM_API_KEY = 'YOUR_DEEPGRAM_API_KEY'
    
    # Name and extension of the file you downloaded (e.g., sample.wav)
    PATH_TO_FILE = 'FILENAME_TO_TRANSCRIBE'
    
    async def main():
      # Initialize the Deepgram SDK
      dg_client = Deepgram(DEEPGRAM_API_KEY)
      # Open the audio file
      with open(PATH_TO_FILE, 'rb') as audio:
        # Replace mimetype as appropriate
        source = {'buffer': audio, 'mimetype': 'audio/wav'}
        response = await dg_client.transcription.prerecorded(source, {'punctuate': True})
        print(json.dumps(response, indent=4))
        
    asyncio.run(main())
    

    Create a new file called index.js in your project's location. Populate this file:

    const fs = require('fs');
    const { Deepgram } = require('../dist');
    
    /** Your Deepgram API Key*/
    const deepgramApiKey = 'YOUR_DEEPGRAM_API_KEY';
    
    /** Name and extension of the file you downloaded (e.g., sample.wav) */
    const pathToFile = 'FILENAME_TO_TRANSCRIBE';
    
    /** Initialize the Deepgram SDK */
    const deepgram = new Deepgram(deepgramApiKey);
    
    /** Load file into a buffer */
    const fileBuffer = fs.readFileSync(pathToFile);
    
    deepgram.transcription.preRecorded({ 
      buffer: fileBuffer, 
      mimetype: 'audio/wav' // or appropriate mimetype of your file 
    }, { 
      punctuate: true 
    })
    .then((transcription) => {
      console.log(transcription);
    })
    .catch((err) => {
      console.log(err);
    })
    

    Be sure to replace YOUR_DEEPGRAM_API_KEY and FILENAME_TO_TRANSCRIBE with your Deepgram API Key and the name of the file you downloaded.

  5. Start the Application

    Run the application from the terminal:

    python deepgram_test.py
    
    node index.js
    
  6. See Results

    Your transcripts will appear in your browser's developer console.

Analyze the Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response:


{
  "metadata":{
    "transaction_key":"Ha0aVG...",
    "request_id":"se24UY...",
    "sha256":"2d5b81...",
    "created":"2021-07-08T09:11:38.593Z",
    "duration":19.0,
    "channels":1
  },
  "results":{
    "channels":[
      {
        "alternatives":[
          {
            "transcript":"Yep. I said it before, and I'll say it again. Life moves pretty fast. You don't stop and look around once in a while. You could miss it. Thank.",
            "confidence":0.9757011,
            "words":[
              {
                "word":"yep",
                "start":5.66,
                "end":5.94,
                "confidence":0.994987,
                "punctuated_word":"Yep."
              },
              {
                "word":"i",
                "start":7.2344832,
                "end":7.434014,
                "confidence":0.8217165,
                "punctuated_word":"I"
              },
              {
                "word":"said",
                "start":7.434014,
                "end":7.5537324,
                "confidence":0.979774,
                "punctuated_word":"said"
              },
              {
                "word":"it",
                "start":7.5537324,
                "end":7.833075,
                "confidence":0.9672828,
                "punctuated_word":"it"
              },
              {
                "word":"before",
                "start":7.833075,
                "end":7.952793,
                "confidence":0.9994516,
                "punctuated_word":"before,"
              },
              {
                "word":"and",
                "start":8.032605,
                "end":8.112417,
                "confidence":0.9968953,
                "punctuated_word":"and"
              },
              {
                "word":"i'll",
                "start":8.152324,
                "end":8.351854,
                "confidence":0.8254508,
                "punctuated_word":"I'll"
              },
              {
                "word":"say",
                "start":8.351854,
                "end":8.471573,
                "confidence":0.9983234,
                "punctuated_word":"say"
              },
              {
                "word":"it",
                "start":8.471573,
                "end":8.711009,
                "confidence":0.9934151,
                "punctuated_word":"it"
              },
              {
                "word":"again",
                "start":8.711009,
                "end":8.950446,
                "confidence":0.99837565,
                "punctuated_word":"again."
              },
              ...
            ]
          }
        ]
      }
    ]
  }
}

In this default response, we see:

  • transcript: the transcript for the audio segment being processed.

  • confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.

  • words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

    Because we passed the punctuate: true option to the transcription.prerecorded method, each word object also includes its punctuated_word value, which contains the transformed word after punctuation and capitalization are applied.

By default, Deepgram applies its general AI model, which is a good, general purpose model for everyday situations.

What's Next?

Now that you've gotten a transcript for pre-recorded audio, enhance your knowledge by exploring the following areas.

Customize Transcripts

To customize the transcripts you receive, you can send a variety of parameters to the Deepgram API.

For example, if you would like to use the phonecall model rather than the general model, you can pass the model: phonecall option to the transcription.prerecorded method in the previous examples:

        response = await dg_client.transcription.prerecorded(source, {'punctuate': True, 'model': 'phonecall'})
    deepgram.transcription.preRecorded({ 
      buffer: fileBuffer, 
      mimetype: 'audio/wav' // or appropriate mimetype of your file 
    }, { 
      punctuate: true,
      model: phonecall 
    })

To learn more about the many ways you can customize your results with Deepgram's API, check out the Deepgram API Reference.

Explore Use Cases

Time to learn about the different ways you can use Deepgram products to help you meet your business objectives. Explore Deepgram's use cases.

Transcribe Streaming Audio

Now that you know how to transcribe pre-recorded audio, check out how you can use Deepgram to transcribe streaming audio in real time. To learn more, see Quickstart: Get Started with Streaming Audio.