For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key

Getting Started with Speech to Text

Was this page helpful?

Getting Started

An introduction to getting transcription data from pre-recorded audio files.

Next
Built with

With Deepgram’s STT API, you can transcribe both pre-recorded files and real-time streams, choose models optimized for different domains, and integrate transcription directly into your apps for use cases like conversational AI, live captions, and agent assist.

Before you start, consider your use case. Deepgram STT offers three main paths:

Streaming audio

Real-time, turn-based transcription for voice agents

Benefits: Model-integrated end-of-turn detection, configurable turn-taking dynamics


Examples: Contact center agents, customer support bots, real-time assistants.


Get started


Currently available for English. For other languages, please use our general use Streaming API.

Realtime transcription for meetings and events

Benefits: Transcripts in real time, larger language availability, can get diarized transcripts


Examples: Captions, live event transcription, monitoring audio feeds.


Get started

Pre-recorded audio

Pre-recorded file transcription

Benefits: Simple implementation, broader language availability, cost efficient


Examples: Transcribing interviews, podcasts, meetings, support calls.


Get started