For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
      • Getting Started
      • Feature Overview
      • Live Streaming Starter Kit
      • Template Apps
        • End of Speech Detection While Live Streaming
        • Using Interim Results
        • Endpointing & Interim Results With Live Streaming
        • Determining Your Audio Format for Live Streaming Audio
        • Measuring Streaming Latency
        • STT Troubleshooting WebSocket, NET, and DATA Errors
        • Recovering From Connection Errors & Timeouts When Live Streaming
        • Using Lower-Level Websockets with the Streaming API
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Streaming Raw Audio
  • Streaming Containerized Audio
  • Determining Your Audio Format
  • Check Documentation
  • Automatically Detect Audio Format
  • Using Raw Audio with Encoding & Sample Rate
Streaming AudioTranscription (Nova-3)Tips and Tricks

Determining Your Audio Format for Live Streaming Audio

Learn how to determine if your audio is containerized or raw, and what this means for correctly formatting your requests to Deepgram's API.
Was this page helpful?
Previous

Measuring STT Latency

Learn how to measure and analyze latency in speech-to-text transcription using Deepgram.

Next
Built with

Before you start streaming audio to Deepgram, it’s important that you understand whether your audio is containerized or raw, so you can correctly form your API request.

The difference between containerized and raw audio relates to how much information about the audio is included within the data:

  • Containerized audio stream: A series of bits is passed along with a header that specifies information about the audio. Containerized audio generally includes enough additional information to allow Deepgram to decode it automatically.
  • Raw audio stream: The series of bits is passed with no further information. Deepgram needs you to manually provide information about the characteristics of raw audio.

Streaming Raw Audio

If you’re streaming raw audio to Deepgram, you must provide the encoding and sample rate of your audio stream in your request. Otherwise, Deepgram will be unable to decode the audio and will fail to return a transcript.

An example of a Deepgram API request to stream raw audio:

wss://api.deepgram.com/v1/listen?encoding=ENCODING_VALUE&sample_rate=SAMPLE_RATE_VALUE

To see a list of raw audio encodings that Deepgram supports, check out our Encoding documentation.

Streaming Containerized Audio

If you’re streaming containerized audio to Deepgram, you should not set the encoding and sample rate of your audio stream. Instead, Deepgram will read the container’s header and get the correct information for your stream automatically.

An example of a Deepgram API request to stream containerized audio:

wss://api.deepgram.com/v1/listen

Deepgram supports over 100 different audio formats and encodings. You can see some of the most popular ones at Supported Audio Format.

Determining Your Audio Format

If you’re not sure whether your audio is raw or containerized, you can identify audio format in a few different ways.

Check Documentation

Start by checking any available documentation for your audio source. Often, it will provide details related to audio format. Specifically, check for any mentions of encodings like Opus, Vorbis, PCM, mu-law, A-law, s16, or linear16.

If your audio source is a web API stream, in many cases it will already be containerized. For example, the audio may be raw Opus audio wrapped in an Ogg container or raw PCM audio wrapped in a WAV container.

Automatically Detect Audio Format

If you’re still not sure whether or not your audio is containerized, you can write an audio stream to disk and try listening to it with a program like VLC. If your audio is containerized, VLC will be able to play it back without any additional configuration.

Alternatively, you can use ffprobe (part of the ffmpeg package, which is a cross-platform solution that records, converts, and streams audio and video) to gather information from the audio stream and detect the audio format of a file.

To use ffprobe, from a terminal, run:

Shell
$ffprobe PATH_TO_FILE

The last line of the output from this command will include any data ffprobe is able to determine about the file’s audio format.

Using Raw Audio with Encoding & Sample Rate

When using raw audio, make sure to set the encoding and the sample rate. Both parameters are required for Deepgram to be able to decode your stream.