For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
  • Get Started
    • Overview
    • Build a Voice Agent
    • Feature Overview
    • Template Apps
  • Configure
    • Overview
    • STT Models
    • LLM Models
    • TTS Models
    • Media Inputs & Outputs
    • Prompting Voice Agents
    • Multilingual Voice Agents
    • Maintaining Context
    • Reusable Agent Configurations
  • Build
    • Multi-Agent Architecture
  • Connect
  • Controls
  • Optimize
    • Voice Agent TTS Controls
    • Message Flow
    • Audio & Playback
    • Audio Preprocessing & Barge-In
    • Adaptive Echo Cancellation
  • Resources
    • SDKs
    • UI Components
    • API Reference
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Choosing a model family
  • Flux
  • Example
  • Multilingual example
  • Nova
  • Example
  • What’s Next
Configure

STT Models

An overview of the speech-to-text models you can use with the Voice Agent API.

Was this page helpful?
Previous

LLM Models

An overview of the LLM providers and models you can use with the Voice Agent API.
Next
Built with

The Voice Agent API uses Deepgram speech-to-text. Two model families are supported, and the agent picks the right STT endpoint based on the version field of agent.listen.provider — you do not manage endpoint URLs yourself.

  • Flux for conversational voice agents that need model-integrated end-of-turn detection and ultra-low latency.
  • Nova for conventional streaming transcription with the broadest feature set: smart formatting, language detection, multilingual code-switching, custom keyterms.

You can set your Voice Agent’s speech-to-text model in the Settings Message. See the docs for more information.

Choosing a model family

Flux (V2)Nova (V1)
Best forlow-latency voice agentsbroadest STT feature set
End-of-turn detectionmodel-integratedapplication-level (VAD)
Smart formattingnoyes
Custom keytermsyesyes
Multilingualflux-general-multi with language_hintslanguage: multi for code-switching
provider.versionv2 (required)v1 (default)

For a deeper comparison see Flux vs Nova-3.

Flux

Flux delivers first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines. See Flux Feature Overview for details.

ParameterTypeDescription
agent.listen.provider.typeStringMust be deepgram
agent.listen.provider.versionStringMust be v2
agent.listen.provider.modelStringFlux model id: flux-general-en or flux-general-multi
agent.listen.provider.language_hintsArray of StringBCP-47 codes that bias the multilingual model toward specific languages. Only valid with flux-general-multi. Without hints, the model auto-detects the spoken language.
agent.listen.provider.keytermsArray of StringBias recognition toward important phrases. See Keyterm Prompting.

Example

JSON
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "version": "v2",
7 "model": "flux-general-en",
8 "keyterms": ["Deepgram", "Aura"]
9 }
10 }
11 }
12}

Multilingual example

JSON
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "version": "v2",
7 "model": "flux-general-multi",
8 "language_hints": ["en", "es"]
9 }
10 }
11 }
12}

For multilingual prompting strategies and examples see Flux Language Prompting.

Nova

ParameterTypeDescription
agent.listen.provider.typeStringMust be deepgram
agent.listen.provider.versionStringOptional. Defaults to v1 when omitted.
agent.listen.provider.modelStringNova model id, for example nova-3 or nova-2.
agent.listen.provider.languageStringBCP-47 language tag (en, en-US, es, etc.) or multi for code-switching.
agent.listen.provider.keytermsArray of StringBias recognition toward important phrases. See Keyterm Prompting.
agent.listen.provider.smart_formatBooleanApply smart formatting to transcripts. Defaults to false.

For the full list of Nova models and supported languages see Models & Languages Overview.

Example

JSON
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "model": "nova-3",
7 "language": "en-US",
8 "smart_format": true,
9 "keyterms": ["Deepgram", "Aura"]
10 }
11 }
12 }
13}

What’s Next

  • Configure the Voice Agent
  • Multilingual Voice Agents
  • Models & Languages Overview