Flux: Expanded Audio Format Support
Flux now supports a wider range of audio formats beyond linear16
, giving you more flexibility in how you send audio to our conversational speech recognition model. This update adds support for additional raw audio encodings and containerized audio formats.
Raw Audio Format Support
Flux now accepts the following raw (non-containerized) audio encodings:
linear16
- 16-bit signed little-endian PCM (existing)linear32
- 32-bit signed little-endian PCM (new)mulaw
- Mu-law encoding (new)alaw
- A-law encoding (new)opus
- Opus codec (new)ogg-opus
- Opus in Ogg container (new)
When using raw audio formats, you must specify both the encoding
and sample_rate
parameters in your API request.
Containerized Audio Support
Flux now also accepts containerized audio formats, eliminating the need to manually specify encoding and sample rate parameters:
- WAV containers with
linear16
encoding - Ogg containers with
opus
encoding
When sending containerized audio, omit the encoding
and sample_rate
parameters—Flux will automatically detect these from the container metadata.
Supported Sample Rates
For raw audio, Flux supports the following sample rates:
- 8000 Hz
- 16000 Hz (recommended)
- 24000 Hz
- 44100 Hz
- 48000 Hz
Implementation
For raw audio:
For containerized audio:
For detailed information about Flux audio requirements, see our Flux documentation.