Use Deepgram's Aura text-to-speech API to transform streaming text to speech via a websocket.

The TTS Websocket API endpoint allows you to stream text into the websocket and stream audio output. If you are using TTS with LLMs, this is a helpful endpoint that allows you to stream LLM outputs into our TTS directly.

To learn more about working with real-time streaming data and results, see Get Started with Streaming Text to Speech.

Endpoint

Production WebSocket server for Deepgram's streaming text to speech. TLS encryption will protect your connection and data. We support a minimum of TLS 1.2.

Path: wss://api.deepgram.com/v1/speak

Accepts

TypeDescription
TextUTF-8 characters only.
MessagesJSON formatted operations.

Headers

HeaderValueDescription
Sec-WebSocket-Protocoltoken, <DEEPGRAM_API_KEY>Used to establish a WebSocket connection with a specific protocol, include your Deepgram API key for authentication.
content-typeapplication/jsonSpecifies that the content being sent is in JSON format.

Body Params

ParameterTypeDescription
textstringrequired Send text as a string or text/plain.

Query Params

ParameterTypeDescription
encodingstringAllows you to specify the expected encoding of your audio output. Learn More.
modelenumAI model used to synthesize text into speech. Default: aura-asteria-en. Learn More.
sample_ratestringSpecifies the sample rate for the output audio. Learn More.

Messages

Text Message

Optional

You can send a Text Input Message to our endpoint to get an audio stream containing the spoken text.

{
  "type": "Speak",
  "text": "<input text>"
}

Flush

Required

🚧

Very frequent flushes can affect audio output quality.

Flush forces the generation of audio from the internal text buffer. Learn More.

{
    "type": "Flush"
}

Clear

Optional

Clear will clear out Deepgram's internal text buffer. Learn More.

{
    "type": "Clear"
}

Close

Optional

Close will close the websocket connection. Learn More.

{
  "type": "Close"
}

Responses

Refer to API Errors for more information.

StatusDescription
200 Text successfully submitted for conversion.
400Error parsing query parameters.
400Invalid callback URL.
400Unsupported audio format requested in query parameters.
400No such model/version combination found.
401Invalid Authorization.
402Payment Required, insufficient credits.
403Insufficient permissions.
429Rate limit exceeded.
503Internal server error if the server is temporarily unable to serve requests.

Metadata Response

The metadata response includes information such as its request ID. Deepgram will send a metadata message immediately after completing the websocket handshake when you establish a websocket connection.

{
    "type": "Metadata",
    "request_id": "<uuid>",
    "model_name": "<string>",
    "model_version": "<string>",
    "model_uuid": "<uuid>",
}

Keeping Connections Open

📘

Our TTS websocket does not support keepAlive messages. Sending a keepAlive message will cause a closure of the websocket.

Ping Pong Messages

Our websocket supports ping/pong messages to keep our connection open. To do so, send a ping message to our websocket.

Use One WebSocket Per Conversation

If you are building for conversational AI use cases where a human is talking to a TTS agent, a single websocket per conversation is required. After you establish a connection, you will not be able to change the voice or media output settings.

Errors and Warnings

If Deepgram encounters an error during streamng text to speech, we will return a WebSocket Close frame. The body of the Close frame will indicate the reason for closing using one of the specification’s pre-defined status codes followed by a UTF-8-encoded payload that represents the reason for the error.

Current codes and payloads in use include:

CodePayloadDescription
1000N/ANormal closure.
1003MESSAGE-0000Input message isn't a supported websocket message type.
1008DATA-0000Input message isn't recognized as a valid command.
1009BIG-0000Input message is too large.
1009BIG-0001Input text has too many characters.
1011NET-0000Internal server error.
1011NET-0001Failed to receive message.
1011NET-0002Failed to send message.
1011NET-0003Time limit exceeded.

Warnings

Warnings will not cause a closure of the websocket connection. A warning message looks like this:

{
  "type": "Warning",
  "warn_code": "<CODE>",
  "warn_msg": "<descriptive message>",
}