STT Models
An overview of the speech-to-text models you can use with the Voice Agent API.
An overview of the speech-to-text models you can use with the Voice Agent API.
The Voice Agent API uses Deepgram speech-to-text. Two model families are supported, and the agent picks the right STT endpoint based on the version field of agent.listen.provider — you do not manage endpoint URLs yourself.
You can set your Voice Agent’s speech-to-text model in the Settings Message. See the docs for more information.
For a deeper comparison see Flux vs Nova-3.
Flux delivers first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines. See Flux Feature Overview for details.
For multilingual prompting strategies and examples see Flux Language Prompting.
For the full list of Nova models and supported languages see Models & Languages Overview.