Text To Speech REST
An overview of the Deepgram Python SDK and Deepgram text-to-speech.
The Deepgram Speak Clients
allows you to generate human-like audio from text.
This SDK supports both the Threaded and Async/Await Clients as described in the Threaded and Async IO Task Support section. The code blocks contain a tab for Threaded
and Async
to show examples for rest
versus asyncrest
, respectively. The difference between Threaded
and Async
is subtle.
Installing the SDK
Make a Deepgram Text-to-Speech Request
The Deepgram Speak Clients
allow you to create audio generated from provided text, also known as text-to-speech.
There are two functions provided to translate text-to-speech:
stream()
- This takes the resulting audio and holds it in memory viaio.BytesIO
.save()
orfile()
- These save the audio stream to a user provided filename.
Audio Output Streaming
Deepgram’s TTS API allows you to start playing the audio once the first byte is received. This section provides examples to help you stream the audio output efficiently.
Single Text Source Payload
The following example demonstrates how to stream the audio as soon as the first byte arrives for a single text source.
Chunked Text Source Payload
This example shows how to chunk the text source by sentence boundaries and stream the audio for each chunk consecutively.
Where to Find Additional Examples
The SDK repository has a good collection of text-to-speech examples. The README contains links to them. Each example below attempts to provide different options for transcribing an audio source.
Some Examples:
- Threaded Client speaking “Hello World” - examples/text-to-speech/rest/file/hello_world
If the Async Client suits your use case better:
- Threaded Client speaking “Hello World” - examples/text-to-speech/rest/file/async_hello_world