TTS Models
An overview of Text-to-Speech providers and models you can use with the Voice Agent API.
An overview of Text-to-Speech providers and models you can use with the Voice Agent API.
By default Deepgram Text-to-Speech will be used with the Voice Agent API. You can also use Deepgram’s native Cartesia support or opt to use another provider’s TTS model with your Agent by applying the following settings.
You can set your Text-to-Speech model in the Settings Message for your Voice Agent. See the docs for more information.
For a complete list of Deepgram TTS models see TTS Voice Selection.
The Deepgram TTS speed parameter is in Early Access. To request access, contact your Account Executive or reach out to sales@deepgram.com.
Deepgram also provides managed support for Cartesia TTS. For a complete list of Cartesia TTS models, visit Cartesia’s TTS Docs. Cartesia is included in the Standard pricing tier.
To use a third party TTS voice, specify the TTS provider and required parameters.
For OpenAI you can refer to this article on how to find your voice ID.
For ElevenLabs you can refer to this article on how to find your Voice ID or use their API to retrieve it. See their TTS Docs for more information. ElevenLabs does not support WebSocket streaming for the eleven_v3 model - instead, use the HTTPS REST endpoint (see example).
We support any of ElevenLabs’ Turbo 2.5 voices to ensure low latency interactions
Because eleven_v3 does not support WebSocket streaming, use the HTTPS REST endpoint:
For Cartesia you can use their API to retrieve a voice ID. See their TTS API Docs for more information.
For Amazon (AWS) Polly you can refer to this article for a list of available voices.
If no engine is specified, Amazon (AWS) Polly defaults to Standard. If the chosen voice doesn’t support Standard, you’ll get an error like: “Standard engine not supported for {voice}.” In that case, you must explicitly specify the correct engine.
The speak object accepts both a single provider and an array of providers. When you supply an array, the Voice Agent uses the providers as an ordered fallback chain: it sends each TTS request to the first provider in the list and automatically falls back to the next provider if the request fails.
SPEAK_REQUEST_FAILED warning over the WebSocket and retries with the next provider.FAILED_TO_SPEAK error and the turn produces no audio response.The fallback is per-request — each new agent utterance starts again from the first provider. Provider order matters, so place your preferred provider first and your most reliable fallback last.
Fallback providers do not need to use the same provider.type. You can mix providers (for example, deepgram primary with an open_ai fallback) to maximize availability across independent infrastructure.
What’s Next