Use Deepgram's Aura text-to-speech API to transform streaming text to speech via a websocket.
The TTS Websocket API endpoint allows you to stream text into the websocket and stream audio output. If you are using TTS with LLMs, this is a helpful endpoint that allows you to stream LLM outputs into our TTS directly.
To learn more about working with real-time streaming data and results, see Get Started with Streaming Text to Speech.
Endpoint
Production WebSocket server for Deepgram's streaming text to speech. TLS encryption will protect your connection and data. We support a minimum of TLS 1.2.
Path: wss://api.deepgram.com/v1/speak |
Accepts
Type | Description |
---|---|
Text | UTF-8 characters only. |
Messages | JSON formatted operations. |
Headers
Header | Value | Description |
---|---|---|
Sec-WebSocket-Protocol | token, <DEEPGRAM_API_KEY> | Used to establish a WebSocket connection with a specific protocol, include your Deepgram API key for authentication. |
content-type | application/json | Specifies that the content being sent is in JSON format. |
Body Params
Parameter | Type | Description |
---|---|---|
text | string | required Send text as a string or text/plain. |
Query Params
Parameter | Type | Description |
---|---|---|
encoding | string | Allows you to specify the expected encoding of your audio output. Learn More. |
model | enum | AI model used to synthesize text into speech. Default: aura-asteria-en . Learn More. |
sample_rate | string | Specifies the sample rate for the output audio. Learn More. |
Messages
Text Message
Optional
You can send a Text Input Message to our endpoint to get an audio stream containing the spoken text.
{
"type": "Speak",
"text": "<input text>"
}
Flush
Required
Very frequent flushes can affect audio output quality.
Flush
forces the generation of audio from the internal text buffer. Learn More.
{
"type": "Flush"
}
Clear
Optional
Clear
will clear out Deepgram's internal text buffer. Learn More.
{
"type": "Clear"
}
Close
Optional
Close
will close the websocket connection. Learn More.
{
"type": "Close"
}
Responses
Refer to API Errors for more information.
Status | Description |
---|---|
200 | Text successfully submitted for conversion. |
400 | Error parsing query parameters. |
400 | Invalid callback URL. |
400 | Unsupported audio format requested in query parameters. |
400 | No such model/version combination found. |
401 | Invalid Authorization. |
402 | Payment Required, insufficient credits. |
403 | Insufficient permissions. |
429 | Rate limit exceeded. |
503 | Internal server error if the server is temporarily unable to serve requests. |
Metadata Response
The metadata
response includes information such as its request ID. Deepgram will send a metadata message immediately after completing the websocket handshake when you establish a websocket connection.
{
"type": "Metadata",
"request_id": "<uuid>",
"model_name": "<string>",
"model_version": "<string>",
"model_uuid": "<uuid>",
}
Keeping Connections Open
Our TTS websocket does not support
keepAlive
messages. Sending akeepAlive
message will cause a closure of the websocket.
Ping Pong Messages
Our websocket supports ping/pong messages to keep our connection open. To do so, send a ping
message to our websocket.
Use One WebSocket Per Conversation
If you are building for conversational AI use cases where a human is talking to a TTS agent, a single websocket per conversation is required. After you establish a connection, you will not be able to change the voice or media output settings.
Errors and Warnings
If Deepgram encounters an error during streamng text to speech, we will return a WebSocket Close frame. The body of the Close frame will indicate the reason for closing using one of the specification’s pre-defined status codes followed by a UTF-8-encoded payload that represents the reason for the error.
Current codes and payloads in use include:
Code | Payload | Description |
---|---|---|
1000 | N/A | Normal closure. |
1003 | MESSAGE-0000 | Input message isn't a supported websocket message type. |
1008 | DATA-0000 | Input message isn't recognized as a valid command. |
1009 | BIG-0000 | Input message is too large. |
1009 | BIG-0001 | Input text has too many characters. |
1011 | NET-0000 | Internal server error. |
1011 | NET-0001 | Failed to receive message. |
1011 | NET-0002 | Failed to send message. |
1011 | NET-0003 | Time limit exceeded. |
Warnings
Warnings will not cause a closure of the websocket connection. A warning message looks like this:
{
"type": "Warning",
"warn_code": "<CODE>",
"warn_msg": "<descriptive message>",
}