Getting Started
An introduction to getting transcription data from pre-recorded audio files.
An introduction to getting transcription data from pre-recorded audio files.
This guide walks you through transcribing pre-recorded audio with the Deepgram API using cURL or one of Deepgram’s SDKs.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
Replace YOUR_DEEPGRAM_API_KEY with your API key and run the following in a terminal or API client.
Replace @youraudio.wav with the path to an audio file on your computer. See Supported Audio Formats for accepted formats.
The above examples include model=nova-3, which tells the API to use Deepgram’s latest model. Removing this parameter defaults to model=base.
They also include Deepgram’s Smart Formatting feature (smart_format=true), which formats currency amounts, phone numbers, email addresses, and more for enhanced readability.
To transcribe pre-recorded audio using one of Deepgram’s SDKs, follow these steps.
Open your terminal, navigate to your project directory, and install the Deepgram SDK along with any required dependencies.
Create a new file in your project and add the following code to transcribe a remote audio file by URL:
To transcribe a local file instead of a remote URL, use the transcribeFile (JavaScript), transcribe_file (Python), TranscribeFile (C#), FromFile (Go), or transcribeFile (Java) method. Pass the file’s binary content and the same options. See the Pre-Recorded Audio API reference for details.
If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs. For language-specific examples without Deepgram’s SDKs, see the code-samples repository. We recommend trying the SDKs first.
Run your application from the terminal. Your transcript appears in your shell.
Deepgram does not store transcripts, so the API response is the only opportunity to retrieve the transcript. Save output or return transcriptions to a callback URL for custom processing.
When the file finishes processing (often after only a few seconds), you receive a JSON response:
The response above is truncated for brevity. The full response includes a words entry for every word in the transcript and all sentences in the paragraphs object.
In this response:
transcript: the transcript for the audio segment being processed.confidence: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.words: an object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
smart_format: true option, each word object also includes its punctuated_word value, which contains the transformed word after punctuation and capitalization are applied.The transaction_key in the metadata field can be ignored. The result is always "transaction_key": "deprecated".
504: Gateway Timeout error.