Text To Speech Streaming
An overview of the Deepgram Python SDK and Deepgram streaming text-to-speech.
The Deepgram Speak Clients
allows you to generate human-like audio from text.
This SDK supports both the Threaded and Async/Await Clients as described in the Threaded and Async IO Task Support section. The code blocks contain a tab for Threaded
and Async
to show examples for websocket
versus asyncwebsocket
, respectively. The difference between Threaded
and Async
is subtle.
Installing the SDK
Make a Deepgram Text-to-Speech Request
The Deepgram Speak Clients
allows you to create audio generated from provided text and stream the audio bytes through your AudioData
callback function. You can also subscribe to other important events, such as:
Metadata
- obtain information about the audio stream.Flushed
- receive an acknowledgment when all the text has been received by Deepgram to convert to audio. This is the result of sending theFlush
message via theflush()
function.Cleared
- that the text buffer has been cleared from Deepgram server. This is the result of sending theClear
message via theclear()
function.
Audio Output Streaming
The audio bytes representing the converted text will stream or be passed to the client via the above AudioData
event using the callback function.
It should be noted that these audio bytes are:
- Container-less audio. Meaning depending on the
encoding
value chosen, only the raw audio data is sent. As an example, if you chooselinear16
as yourencoding
for audio, aWAV
header will not be sent. Please see the Tips and Tricks for more information. - Not of standard size/length when received by the client. This is because the text is broken down into sounds representing the speech. Certain sounds chained together to form fragments of spoken words are different in length and content.
Depending on what the use case is for the generated audio bytes, please visit one of these guides to better help utilize these audio bytes for your use case:
- Sending LLM Outputs to a WebSocket
- Text Chunking for Streaming TTS Optimization
- Handling Audio Issues in Text To Speech
Where to Find Additional Examples
The SDK repository has a good collection of text-to-speech examples. The README contains links to them. Each example below attempts to provide different options for transcribing an audio source.
Some Examples:
- Threaded Client speaking “Hello World” - examples/text-to-speech/websocket/complete
If the Async Client suits your use case better:
- Threaded Client speaking “Hello World” - examples/text-to-speech/websocket/async_complete