Flux: Expanded Audio Format Support

Flux now supports a wider range of audio formats beyond linear16, giving you more flexibility in how you send audio to our conversational speech recognition model. This update adds support for additional raw audio encodings and containerized audio formats.

Raw Audio Format Support

Flux now accepts the following raw (non-containerized) audio encodings:

  • linear16 - 16-bit signed little-endian PCM (existing)
  • linear32 - 32-bit signed little-endian PCM (new)
  • mulaw - Mu-law encoding (new)
  • alaw - A-law encoding (new)
  • opus - Opus codec (new)
  • ogg-opus - Opus in Ogg container (new)

When using raw audio formats, you must specify both the encoding and sample_rate parameters in your API request.

Containerized Audio Support

Flux now also accepts containerized audio formats, eliminating the need to manually specify encoding and sample rate parameters:

  • WAV containers with linear16 encoding
  • Ogg containers with opus encoding

When sending containerized audio, omit the encoding and sample_rate parameters—Flux will automatically detect these from the container metadata.

Supported Sample Rates

For raw audio, Flux supports the following sample rates:

  • 8000 Hz
  • 16000 Hz (recommended)
  • 24000 Hz
  • 44100 Hz
  • 48000 Hz

Implementation

For raw audio:

wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000

For containerized audio:

wss://api.deepgram.com/v2/listen?model=flux-general-en

For detailed information about Flux audio requirements, see our Flux documentation.