Multilingual Voice Agents

Pick the right STT and TTS models for a voice agent that speaks more than one language.

A multilingual voice agent has two model decisions: which STT model transcribes the user, and which TTS model speaks the agent. Pick each one based on what your agent needs to do at runtime.

Pick your STT model

Your situationUse this
Conversational agent in one of the 10 Flux Multilingual languages, with turn awareness and barge-inFlux Multilingual (flux-general-multi)
Single known language, all callsFlux Multilingual with one language_hint, or the language-specific Nova model
Multilingual support center, calls arrive in different languagesFlux Multilingual with multiple language_hint values
Code-switching mid-conversation, no need for turn awarenessNova-3 with language: "multi"

Flux Multilingual is the default recommendation. It handles turn awareness and interruption with the same low latency as flux-general-en. Use Nova-3 only when you need code-switching but not the conversational features.

Flux Multilingual configuration

  • agent.listen.provider.type: deepgram
  • agent.listen.provider.version: v2
  • agent.listen.provider.model: flux-general-multi
  • agent.listen.provider.language_hint: one or more BCP-47 codes (optional)
1{
2 "agent": {
3 "listen": {
4 "provider": {
5 "type": "deepgram",
6 "version": "v2",
7 "model": "flux-general-multi",
8 "language_hint": ["en", "es"]
9 }
10 }
11 }
12}

The language_hint parameter biases the model toward specific languages and improves accuracy. With no hints, the model auto-detects the spoken language. Pass one hint for known-language calls and multiple hints for multilingual support centers. See Flux Multilingual & Language Prompting for the full hint reference and supported languages.

When you use flux-general-multi, user ConversationText events include languages_hinted and languages fields. See Conversation Text.

Nova-3 multi configuration

  • agent.listen.provider.model: nova-3
  • agent.listen.provider.language: multi

Pick your TTS model

Your situationUse this
Bilingual English/Spanish agentDeepgram Aura codeswitching voice
Any other multilingual mixCartesia, OpenAI, or Eleven Labs with language: "multi"

Deepgram Aura codeswitching (English/Spanish)

Aura ships five voices that switch between English and Spanish naturally inside one response: Aquila, Carina, Diana, Javier, and Selena.

  • agent.speak.provider.type: deepgram
  • agent.speak.provider.model: aura-2-aquila-es (or aura-2-carina-es, aura-2-diana-es, aura-2-javier-es, aura-2-selena-es)

These voices handle mixed-language responses without switching providers. See TTS Models for the full Spanish voice catalog.

Third-party multilingual TTS

For other language combinations, set the speak provider to OpenAI, Eleven Labs, or Cartesia and pass agent.speak.provider.language: "multi". For Eleven Labs, this parameter maps to language_code.

Prompt for the language behavior you want

LLM behavior varies by provider. The prompt steers the agent toward a specific language strategy.

GoalPrompt pattern
Mirror the user’s language turn by turn (English ↔ Spanish ↔ English)“Match the language of each user message independently.”
Force the agent to speak one language regardless of user input”Always respond in English, even if the user speaks another language.”
Default to one language but mix in another when relevant”Respond in English unless the user speaks Spanish; if Spanish, mix Spanish and English naturally.”