Getting Started
An introduction to getting transcription data from pre-recorded audio files.
This guide walks you through transcribing pre-recorded audio with the Deepgram API using cURL or one of Deepgram’s SDKs.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
cURL
Replace YOUR_DEEPGRAM_API_KEY with your API key and run the following in a terminal or API client.
Remote file
Local file
Replace @youraudio.wav with the path to an audio file on your computer. See Supported Audio Formats for accepted formats.
The above examples include model=nova-3, which tells the API to use Deepgram’s latest model. Removing this parameter defaults to model=base.
They also include Deepgram’s Smart Formatting feature (smart_format=true), which formats currency amounts, phone numbers, email addresses, and more for enhanced readability.
SDKs
To transcribe pre-recorded audio using one of Deepgram’s SDKs, follow these steps.
Install the SDK and dependencies
Open your terminal, navigate to your project directory, and install the Deepgram SDK along with any required dependencies.
Transcribe a remote file
Create a new file in your project and add the following code to transcribe a remote audio file by URL:
To transcribe a local file instead of a remote URL, use the transcribeFile (JavaScript), transcribe_file (Python), TranscribeFile (C#), or FromFile (Go) method. Pass the file’s binary content and the same options. See the Pre-Recorded Audio API reference for details.
Non-SDK code examples
For language-specific examples without Deepgram’s SDKs, see the code-samples repository. We recommend trying the SDKs first.
Results
Run your application from the terminal. Your transcript appears in your shell.
Deepgram does not store transcripts, so the API response is the only opportunity to retrieve the transcript. Save output or return transcriptions to a callback URL for custom processing.
Analyze the response
When the file finishes processing (often after only a few seconds), you receive a JSON response:
The response above is truncated for brevity. The full response includes a words entry for every word in the transcript and all sentences in the paragraphs object.
In this response:
transcript: the transcript for the audio segment being processed.confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.words: an object containing eachwordin the transcript, along with itsstarttime andendtime (in seconds) from the beginning of the audio stream, and aconfidencevalue.- Because we passed the
smart_format: trueoption, each word object also includes itspunctuated_wordvalue, which contains the transformed word after punctuation and capitalization are applied.
- Because we passed the
The transaction_key in the metadata field can be ignored. The result is always "transaction_key": "deprecated".
Limits
- File size: Maximum 2 GB. For large video files, extract the audio stream first.
- Rate limits: Up to 100 concurrent requests per project for Nova, Base, and Enhanced models. For full details, see API Rate Limits.
- Processing time: Requests exceeding 10 minutes (Nova/Base/Enhanced) or 20 minutes (Whisper) return a
504: Gateway Timeouterror.
What’s next?
- Feature overview: Review the full list of features available for pre-recorded speech-to-text.
- Language: Transcribe audio in other languages.
- Streaming audio: Transcribe audio in real time.
- Use cases: Explore ways to use Deepgram products.