TTS Models
An overview of Text-to-Speech providers and models you can use with the Voice Agent API.
By default Deepgram Text-to-Speech will be used with the Voice Agent API. You can also use Deepgramโs native Cartesia support or opt to use another providerโs TTS model with your Agent by applying the following settings.
You can set your Text-to-Speech model in the Settings Message for your Voice Agent. See the docs for more information.
Deepgram TTS models
For a complete list of Deepgram TTS models see TTS Voice Selection.
The speed parameter is in Early Access and is only available for Deepgram TTS. To request access, contact your Account Executive or reach out to sales@deepgram.com.
Example
Deepgram-managed Cartesia TTS models
Deepgram also provides managed support for Cartesia TTS. For a complete list of Cartesia TTS models, visit Cartesiaโs TTS Docs. Cartesia is included in the Standard pricing tier.
Example
BYO Third Party TTS models
To use a third party TTS voice, specify the TTS provider and required parameters.
OpenAI
For OpenAI you can refer to this article on how to find your voice ID.
Example
Eleven Labs
For ElevenLabs you can refer to this article on how to find your Voice ID or use their API to retrieve it. See their TTS Docs for more information.
We support any of ElevenLabsโ Turbo 2.5 voices to ensure low latency interactions
Example
Cartesia
For Cartesia you can use their API to retrieve a voice ID. See their TTS API Docs for more information.
Example
Amazon (AWS) Polly
For Amazon (AWS) Polly you can refer to this article for a list of available voices.
If no engine is specified, Amazon (AWS) Polly defaults to Standard. If the chosen voice doesnโt support Standard, youโll get an error like: โStandard engine not supported for {voice}.โ In that case, you must explicitly specify the correct engine.
STS Example
IAM Example
Using multiple TTS providers
The speak object accepts both a single provider and an array of providers. When you supply an array, the Voice Agent uses the providers as an ordered fallback chain: it sends each TTS request to the first provider in the list and automatically falls back to the next provider if the request fails.
How fallback works
- The agent sends the request to the first provider in the array.
- If that provider returns an error or times out, the agent sends a
SPEAK_REQUEST_FAILEDwarning over the WebSocket and retries with the next provider. - This continues through every provider in the array.
- If all providers fail, the agent sends a
FAILED_TO_SPEAKerror and the turn produces no audio response.
The fallback is per-request โ each new agent utterance starts again from the first provider. Provider order matters, so place your preferred provider first and your most reliable fallback last.
Fallback providers do not need to use the same provider.type. You can mix providers (for example, deepgram primary with an open_ai fallback) to maximize availability across independent infrastructure.
Example
Whatโs Next