Getting Started
An introduction to getting transcription data from pre-recorded audio files.
This guide will walk you through how to transcribe pre-recorded audio with the Deepgram API. We provide two scenarios to try: transcribe a remote file and transcribe a local file.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
API Playground
First, quickly explore Deepgram Speech to Text in our API Playground.
Try this feature out in our API Playground!
CURL
Next, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY
and then run the following examples in a terminal or your favorite API client.
If you run the “Local file CURL Example,” be sure to change @youraudio.wav
to the path/filename of an audio file on your computer. (Read more about supported audio formats here).
Remote File CURL Example
Local File CURL Example
The above example includes the parameter model=nova-3
, which tells the API to use Deepgram’s most latest model. Removing this parameter will result in the API using the default model, which is currently model=base
.
It also includes Deepgram’s Smart Formatting feature, smart_format=true
. This will format currency amounts, phone numbers, email addresses, and more for enhanced transcript readability.
SDKs
To transcribe pre-recorded audio using one of Deepgram’s SDKs, follow these steps.
Install the SDK
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
Add Dependencies
Transcribe a Remote File
This example shows how to analyze a remote audio file (a URL that hosts your audio file) using Deepgram’s SDKs. In your terminal, create a new file in your project’s location, and populate it with the code.
Transcribe a Local File
This example shows how to analyze a local audio file (an audio file on your computer) using Deepgram’s SDKs. In your terminal, create a new file in your project’s location, and populate it with the code. (Be sure to replace the audio filename with a path/filename of an audio file on your computer.)
Non-SDK Code Examples
If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs.
Results
In order to see the results from Deepgram, you must run the application. Run your application from the terminal. Your transcripts will appear in your shell.
Deepgram does not store transcripts, so the Deepgram API response is the only opportunity to retrieve the transcript. Make sure to save output or return transcriptions to a callback URL for custom processing.
Analyze the Response
When the file is finished processing (often after only a few seconds), you’ll receive a JSON response:
In this default response, we see:
-
transcript
: the transcript for the audio segment being processed. -
confidence
: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence. -
words
: an object containing eachword
in the transcript, along with itsstart
time andend
time (in seconds) from the beginning of the audio stream, and aconfidence
value.Because we passed the
smart_format: true
option to thetranscription.prerecorded
method, each word object also includes itspunctuated_word
value, which contains the transformed word after punctuation and capitalization are applied.
The transaction_key
in the metadata
field can be ignored. The result will always be "transaction_key": "deprecated"
.
Limits
There are a few limits to be aware of when making a pre-recorded speech-to-text request.
File Size
- The maximum file size is limited to 2 GB.
- For large video files, extract the audio stream and upload only the audio to Deepgram. This reduces the file size significantly.
Rate Limits
Nova, Base, and Enhanced Models:
- Maximum of 100 concurrent requests per project.
Whisper Model:
- Paid plan: 15 concurrent requests.
- Pay-as-you-go plan: 5 concurrent requests.
Exceeding these limits will result in a 429: Too Many Requests error.
Maximum Processing Time
Fast Transcription Models (Nova, Base, and Enhanced)
- These models offer extremely fast transcription.
- Maximum processing time: 10 minutes.
Slower Transcription Model (Whisper)
- Whisper transcribes more slowly compared to other models.
- Maximum processing time: 20 minutes.
Timeout Policy
- If a request exceeds the maximum processing time, it will be canceled.
- In such cases, a 504: Gateway Timeout error will be returned.
What’s Next?
Now that you’ve transcribed pre-recorded audio, enhance your knowledge by exploring the following areas.
Try the Starter Apps
- Clone and run one of our Starter App repositories to see a full application with a frontend UI and a backend server sending audio to Deepgram.
Read the Feature Guides
Deepgram’s features help you to customize your transcripts.
- Language: Learn how to transcribe audio in other languages.
- Profanity Filtering and Redaction: Discover how to remove profanity or redact personal information like credit card numbers.
- Feature Overview: Review the list of features available for pre-recorded speech-to-text. Then, dive into individual guides for more details.
Explore Use Cases
- Learn about the different ways you can use Deepgram products to help you meet your business objectives. Explore Deepgram’s use cases.
Transcribe Streaming Audio
- Now that you know how to transcribe pre-recorded audio, check out how you can use Deepgram to transcribe streaming audio in real time. To learn more, see Getting Started with Streaming Audio.