TTS Models

An overview of Text-to-Speech providers and models you can use with the Voice Agent API.

By default Deepgram Text-to-Speech will be used with the Voice Agent API. You can also use Deepgramโ€™s native Cartesia support or opt to use another providerโ€™s TTS model with your Agent by applying the following settings.

You can set your Text-to-Speech model in the Settings Message for your Voice Agent. See the docs for more information.

Deepgram TTS models

For a complete list of Deepgram TTS models see TTS Voice Selection.

ParameterTypeDescription
agent.speak.provider.typeStringMust be deepgram
agent.speak.provider.modelStringThe TTS model to use
agent.speak.provider.speedFloatSpeaking rate multiplier. Range: 0.7 - 1.5. Defaults to 1.0. See TTS voice controls for details.

The speed parameter is in Early Access and is only available for Deepgram TTS. To request access, contact your Account Executive or reach out to sales@deepgram.com.

Example

JSON
1{
2 "speak": {
3 "provider": {
4 "type": "deepgram",
5 "model": "aura-2-thalia-en",
6 "speed": 0.9
7 }
8 }
9}

Deepgram-managed Cartesia TTS models

Deepgram also provides managed support for Cartesia TTS. For a complete list of Cartesia TTS models, visit Cartesiaโ€™s TTS Docs. Cartesia is included in the Standard pricing tier.

ParameterTypeDescription
agent.speak.provider.typeStringMust be cartesia
agent.speak.provider.modelStringThe TTS model to use

Example

JSON
1{
2"agent": {
3 "speak": {
4 "provider": {
5 "type": "cartesia",
6 "model_id": "sonic-2",
7 "voice": {
8 "mode": "id",
9 "id": "a167e0f3-df7e-4d52-a9c3-f949145efdab"
10 }
11 }
12 }
13}
14}

BYO Third Party TTS models

To use a third party TTS voice, specify the TTS provider and required parameters.

OpenAI

For OpenAI you can refer to this article on how to find your voice ID.

ParameterTypeDescription
agent.speak.provider.typeStringMust be open_ai
agent.speak.provider.modelStringThe TTS model to use
agent.speak.provider.voiceStringThe voice to use
agent.speak.endpointObjectRequired and must include url and headers
agent.speak.endpoint.urlStringYour OpenAI API endpoint URL
agent.speak.endpoint.headersObjectRequired headers for authentication

Example

JSON
1{
2 "agent": {
3 "speak": {
4 "provider": {
5 "type": "open_ai",
6 "model": "tts-1",
7 "voice": "alloy"
8 },
9 "endpoint": {
10 "url": "https://api.openai.com/v1/audio/speech",
11 "headers": {
12 "authorization": "Bearer {{OPENAI_API_KEY}}"
13 }
14 }
15 }
16 }
17}

Eleven Labs

For ElevenLabs you can refer to this article on how to find your Voice ID or use their API to retrieve it. See their TTS Docs for more information.

We support any of ElevenLabsโ€™ Turbo 2.5 voices to ensure low latency interactions

ParameterTypeDescription
agent.speak.provider.typeStringMust be eleven_labs
agent.speak.provider.model_idStringThe model ID to use
agent.speak.provider.language_codeStringOptional Language code
agent.speak.endpointObjectMust include url and headers
agent.speak.endpoint.urlStringYour Eleven Labs API endpoint URL
agent.speak.endpoint.headersObjectHeaders for authentication

Example

JSON
1{
2 "agent": {
3 "speak": {
4 "provider": {
5 "type": "eleven_labs",
6 "model_id": "eleven_turbo_v2_5",
7 "language_code": "en-US"
8 },
9 "endpoint": {
10 "url": "wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/multi-stream-input",
11 "headers": {
12 "xi-api-key": "{{ELEVEN_LABS_API_KEY}}"
13 }
14 }
15 }
16 }
17}

Cartesia

For Cartesia you can use their API to retrieve a voice ID. See their TTS API Docs for more information.

ParameterTypeDescription
agent.speak.provider.typeStringMust be cartesia
agent.speak.provider.model_idStringThe model ID to use
agent.speak.provider.voiceObjectCartesia Voice configuration
agent.speak.provider.voice.modeStringThe voice mode to use
agent.speak.provider.voice.idStringThe voice ID to use
agent.speak.provider.languageStringLanguage setting
agent.speak.endpointObjectMust include url and headers
agent.speak.endpoint.urlStringYour Cartesia API endpoint URL
agent.speak.endpoint.headersObjectHeaders for authentication

Example

JSON
1{
2 "agent": {
3 "speak": {
4 "provider": {
5 "type": "cartesia",
6 "model_id": "sonic-2",
7 "voice": {
8 "mode": "id",
9 "id": "a167e0f3-df7e-4d52-a9c3-f949145efdab"
10 },
11 "language": "en"
12 },
13 "endpoint": {
14 "url": "https://api.cartesia.ai/tts/bytes",
15 "headers": {
16 "x-api-key": "{{CARTESIA_API_KEY}}"
17 }
18 }
19 }
20 }
21}

Amazon (AWS) Polly

For Amazon (AWS) Polly you can refer to this article for a list of available voices.

If no engine is specified, Amazon (AWS) Polly defaults to Standard. If the chosen voice doesnโ€™t support Standard, youโ€™ll get an error like: โ€œStandard engine not supported for {voice}.โ€ In that case, you must explicitly specify the correct engine.

ParameterTypeDescription
agent.speak.provider.typeStringMust be aws_polly
agent.speak.provider.language_codeStringThe language code to use
agent.speak.provider.voiceStringThe voice to use
agent.speak.provider.engineStringThe engine to use
agent.speak.provider.credentialsObjectThe credentials to use

STS Example

JSON
1{
2 "agent": {
3 "speak": {
4 "provider": {
5 "type": "aws_polly",
6 "language_code": "en-US",
7 "voice": "Matthew",
8 "engine": "standard",
9 "credentials": {
10 "type": "sts",
11 "region": "us-west-2",
12 "access_key_id": "{{AWS_ACCESS_KEY_ID}}",
13 "secret_access_key": "{{AWS_SECRET_ACCESS_KEY}}",
14 "session_token": "{{AWS_SESSION_TOKEN}}"
15 }
16 },
17 "endpoint": {
18 "url": "https://polly.us-west-2.amazonaws.com/v1/speech"
19 }
20 }
21 }
22}

IAM Example

JSON
1{
2 "agent": {
3 "speak": {
4 "provider": {
5 "type": "aws_polly",
6 "voice": "Joanna",
7 "language_code": "en-US",
8 "engine": "standard",
9 "credentials": {
10 "type": "iam",
11 "region": "us-east-2",
12 "access_key_id": "{{AWS_ACCESS_KEY_ID}}",
13 "secret_access_key": "{{AWS_SECRET_ACCESS_KEY}}"
14 }
15 },
16 "endpoint": {
17 "url": "https://polly.us-east-2.amazonaws.com/v1/speech"
18 }
19 }
20 }
21}

Using multiple TTS providers

The speak object accepts both a single provider and an array of providers. When you supply an array, the Voice Agent uses the providers as an ordered fallback chain: it sends each TTS request to the first provider in the list and automatically falls back to the next provider if the request fails.

How fallback works

  1. The agent sends the request to the first provider in the array.
  2. If that provider returns an error or times out, the agent sends a SPEAK_REQUEST_FAILED warning over the WebSocket and retries with the next provider.
  3. This continues through every provider in the array.
  4. If all providers fail, the agent sends a FAILED_TO_SPEAK error and the turn produces no audio response.

The fallback is per-request โ€” each new agent utterance starts again from the first provider. Provider order matters, so place your preferred provider first and your most reliable fallback last.

Fallback providers do not need to use the same provider.type. You can mix providers (for example, deepgram primary with an open_ai fallback) to maximize availability across independent infrastructure.

Example

JSON
1{
2 "agent": {
3 "speak": [
4 {
5 "provider": {
6 "type": "deepgram",
7 "model": "aura-2-zeus-en"
8 }
9 },
10 {
11 "provider": {
12 "type": "open_ai",
13 "model": "tts-1",
14 "voice": "shimmer"
15 },
16 "endpoint": {
17 "url": "https://api.openai.com/v1/audio/speech",
18 "headers": {
19 "authorization": "Bearer {{OPENAI_API_KEY}}"
20 }
21 }
22 }
23 ]
24 }
25}

Whatโ€™s Next