STT Models
An overview of the speech-to-text models you can use with the Voice Agent API.
The Voice Agent API uses Deepgram speech-to-text. Two model families are supported, and the agent picks the right STT endpoint based on the version field of agent.listen.provider — you do not manage endpoint URLs yourself.
- Flux for conversational voice agents that need model-integrated end-of-turn detection and ultra-low latency.
- Nova for conventional streaming transcription with the broadest feature set: smart formatting, language detection, multilingual code-switching, custom keyterms.
You can set your Voice Agent’s speech-to-text model in the Settings Message. See the docs for more information.
Choosing a model family
For a deeper comparison see Flux vs Nova-3.
Flux
Flux delivers first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines. See Flux Feature Overview for details.
Example
Multilingual example
For multilingual prompting strategies and examples see Flux Language Prompting.
Nova
For the full list of Nova models and supported languages see Models & Languages Overview.