For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
  • Get Started
    • Overview
    • Build a Voice Agent
    • Feature Overview
    • Template Apps
  • Configure
    • Overview
    • STT Models
    • LLM Models
    • TTS Models
    • Media Inputs & Outputs
    • Prompting Voice Agents
    • Multilingual Voice Agents
    • Maintaining Context
    • Reusable Agent Configurations
  • Build
    • Multi-Agent Architecture
  • Connect
  • Controls
  • Optimize
    • Voice Agent TTS Controls
    • Message Flow
    • Audio & Playback
    • Audio Preprocessing & Barge-In
    • Adaptive Echo Cancellation
  • Resources
    • SDKs
    • UI Components
    • API Reference
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Pick your STT model
  • Flux Multilingual configuration
  • Nova-3 multi configuration
  • Pick your TTS model
  • Deepgram Aura codeswitching (English/Spanish)
  • Third-party multilingual TTS
  • Prompt for the language behavior you want
Configure

Multilingual Voice Agents

Pick the right STT and TTS models for a voice agent that speaks more than one language.
Was this page helpful?
Previous

Maintaining Context

Every lever the Voice Agent API gives you for shaping what the agent knows and remembers across a session.
Next
Built with

A multilingual voice agent has two model decisions: which STT model transcribes the user, and which TTS model speaks the agent. Pick each one based on what your agent needs to do at runtime.

Pick your STT model

Your situationUse this
Conversational agent in one of the 10 Flux Multilingual languages, with turn awareness and barge-inFlux Multilingual (flux-general-multi)
Single known language, all callsFlux Multilingual with a single-entry language_hints array, or the language-specific Nova model
Multilingual support center, calls arrive in different languagesFlux Multilingual with multiple entries in the language_hints array
Code-switching mid-conversation, no need for turn awarenessNova-3 with language: "multi"

Flux Multilingual is the default recommendation. It handles turn awareness and interruption with the same low latency as flux-general-en. Use Nova-3 only when you need code-switching but not the conversational features.

Flux Multilingual configuration

  • agent.listen.provider.type: deepgram
  • agent.listen.provider.version: v2
  • agent.listen.provider.model: flux-general-multi
  • agent.listen.provider.language_hints: array of one or more BCP-47 codes (optional)
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "version": "v2",
7 "model": "flux-general-multi",
8 "language_hints": ["en", "es"]
9 }
10 }
11 }
12}

The language_hints parameter biases the model toward specific languages and improves accuracy. With no hints, the model auto-detects the spoken language. Pass one hint for known-language calls and multiple hints for multilingual support centers. See Flux Multilingual & Language Prompting for the full hint reference and supported languages.

When you use flux-general-multi, user ConversationText events include languages_hinted and languages fields. See Conversation Text.

Nova-3 multi configuration

  • agent.listen.provider.model: nova-3
  • agent.listen.provider.language: multi

Pick your TTS model

Your situationUse this
Bilingual English/Spanish agentDeepgram Aura codeswitching voice
Any other multilingual mixCartesia, OpenAI, or Eleven Labs with language: "multi"

Deepgram Aura codeswitching (English/Spanish)

Aura ships five voices that switch between English and Spanish naturally inside one response: Aquila, Carina, Diana, Javier, and Selena.

  • agent.speak.provider.type: deepgram
  • agent.speak.provider.model: aura-2-aquila-es (or aura-2-carina-es, aura-2-diana-es, aura-2-javier-es, aura-2-selena-es)

These voices handle mixed-language responses without switching providers. See TTS Models for the full Spanish voice catalog.

Third-party multilingual TTS

For other language combinations, set the speak provider to OpenAI, Eleven Labs, or Cartesia and pass agent.speak.provider.language: "multi". For Eleven Labs, this parameter maps to language_code.

Prompt for the language behavior you want

LLM behavior varies by provider. The prompt steers the agent toward a specific language strategy.

GoalPrompt pattern
Mirror the user’s language turn by turn (English ↔ Spanish ↔ English)“Match the language of each user message independently.”
Force the agent to speak one language regardless of user input”Always respond in English, even if the user speaks another language.”
Default to one language but mix in another when relevant”Respond in English unless the user speaks Spanish; if Spanish, mix Spanish and English naturally.”