STT Models

An overview of the speech-to-text models you can use with the Voice Agent API.

The Voice Agent API uses Deepgram speech-to-text. Two model families are supported, and the agent picks the right STT endpoint based on the version field of agent.listen.provider — you do not manage endpoint URLs yourself.

  • Flux for conversational voice agents that need model-integrated end-of-turn detection and ultra-low latency.
  • Nova for conventional streaming transcription with the broadest feature set: smart formatting, language detection, multilingual code-switching, custom keyterms.

You can set your Voice Agent’s speech-to-text model in the Settings Message. See the docs for more information.

Choosing a model family

Flux (V2)Nova (V1)
Best forlow-latency voice agentsbroadest STT feature set
End-of-turn detectionmodel-integratedapplication-level (VAD)
Smart formattingnoyes
Custom keytermsyesyes
Multilingualflux-general-multi with language_hintlanguage: multi for code-switching
provider.versionv2 (required)v1 (default)

For a deeper comparison see Flux vs Nova-3.

Flux

Flux delivers first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines. See Flux Feature Overview for details.

ParameterTypeDescription
agent.listen.provider.typeStringMust be deepgram
agent.listen.provider.versionStringMust be v2
agent.listen.provider.modelStringFlux model id: flux-general-en or flux-general-multi
agent.listen.provider.language_hintString or Array of StringBCP-47 codes that bias the multilingual model toward specific languages. Only valid with flux-general-multi. Without hints, the model auto-detects the spoken language.
agent.listen.provider.keytermsArray of StringBias recognition toward important phrases. See Keyterm Prompting.

Example

JSON
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "version": "v2",
7 "model": "flux-general-en",
8 "keyterms": ["Deepgram", "Aura"]
9 }
10 }
11 }
12}

Multilingual example

JSON
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "version": "v2",
7 "model": "flux-general-multi",
8 "language_hint": ["en", "es"]
9 }
10 }
11 }
12}

For multilingual prompting strategies and examples see Flux Language Prompting.

Nova

ParameterTypeDescription
agent.listen.provider.typeStringMust be deepgram
agent.listen.provider.versionStringOptional. Defaults to v1 when omitted.
agent.listen.provider.modelStringNova model id, for example nova-3 or nova-2.
agent.listen.provider.languageStringBCP-47 language tag (en, en-US, es, etc.) or multi for code-switching.
agent.listen.provider.keytermsArray of StringBias recognition toward important phrases. See Keyterm Prompting.
agent.listen.provider.smart_formatBooleanApply smart formatting to transcripts. Defaults to false.

For the full list of Nova models and supported languages see Models & Languages Overview.

Example

JSON
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "model": "nova-3",
7 "language": "en-US",
8 "smart_format": true,
9 "keyterms": ["Deepgram", "Aura"]
10 }
11 }
12 }
13}

What’s Next