Text To Speech REST
An overview of the Deepgram Python SDK and Deepgram text-to-speech.
The Deepgram Speak Clients
allows you to generate human-like audio from text.
This SDK supports both the Threaded and Async/Await Clients as described in the Threaded and Async IO Task Support section. The code blocks contain a tab for
Threaded
andAsync
to show examples forrest
versusasyncrest
, respectively. The difference betweenThreaded
andAsync
is subtle.
Installing the SDK
# Install the Deepgram Python SDK
# https://github.com/deepgram/deepgram-python-sdk
pip install deepgram-sdk==3.*
Make a Deepgram Text-to-Speech Request
The Deepgram Speak Clients
allow you to create audio generated from provided text, also known as text-to-speech.
There are two functions provided to translate text-to-speech:
stream()
- This takes the resulting audio and holds it in memory viaio.BytesIO
.save()
orfile()
- These save the audio stream to a user provided filename.
import os
from deepgram import DeepgramClient, SpeakOptions
SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.mp3"
def main():
try:
# STEP 1: Create a Deepgram client.
# By default, the DEEPGRAM_API_KEY environment variable will be used for the API Key
deepgram = DeepgramClient()
# STEP 2: Configure the options (such as model choice, audio configuration, etc.)
options = SpeakOptions(
model="aura-asteria-en",
)
# STEP 3: Call the save method on the speak property
response = deepgram.speak.rest.v("1").save(filename, SPEAK_OPTIONS, options)
print(response.to_json(indent=4))
except Exception as e:
print(f"Exception: {e}")
if __name__ == "__main__":
main()
import asyncio
from deepgram import DeepgramClient, SpeakOptions
SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.mp3"
async def main():
try:
# STEP 1: Create a Deepgram client.
# By default, the DEEPGRAM_API_KEY environment variable will be used for the API Key
deepgram = DeepgramClient()
# STEP 2: Configure the options (such as model choice, audio configuration, etc.)
options = SpeakOptions(
model="aura-asteria-en",
)
# STEP 3: Call the save method on the speak property
response = await deepgram.speak.asyncrest.v("1").save(filename, SPEAK_OPTIONS, options)
print(response.to_json(indent=4))
except Exception as e:
print(f"Exception: {e}")
if __name__ == "__main__":
asyncio.run(main())
Audio Output Streaming
Deepgram's TTS API allows you to start playing the audio once the first byte is received. This section provides examples to help you stream the audio output efficiently.
Single Text Source Payload
The following example demonstrates how to stream the audio as soon as the first byte arrives for a single text source.
import re
from deepgram import Deepgram
import requests
DEEPGRAM_API_KEY = 'YOUR_DEEPGRAM_API_KEY'
deepgram = Deepgram(DEEPGRAM_API_KEY)
inputText = "Your long text goes here..."
def segmentTextBySentence(text):
return re.findall(r"[^.!?]+[.!?]", text)
def synthesizeAudio(text):
options = SpeakOptions(
model="aura-asteria-en",
)
response = deepgram.speak.rest.v("1").stream(filename, SPEAK_OPTIONS, options)
if response.status_code == 200:
return response.content
else:
raise Exception('Error generating audio')
def main():
segments = segmentTextBySentence(inputText)
for i, segment in enumerate(segments):
try:
audioData = synthesizeAudio(segment)
with open(f"output_{i}.wav", 'wb') as f:
f.write(audioData)
print(f"Audio stream finished for segment: {segment}")
except Exception as error:
print(f"Error synthesizing audio: {error}")
print("Audio file creation completed.")
main()
Chunked Text Source Payload
This example shows how to chunk the text source by sentence boundaries and stream the audio for each chunk consecutively.
import re
from deepgram import Deepgram
import requests
DEEPGRAM_API_KEY = 'YOUR_DEEPGRAM_API_KEY'
deepgram = Deepgram(DEEPGRAM_API_KEY)
inputText = "Your long text goes here..."
def segmentTextBySentence(text):
return re.findall(r"[^.!?]+[.!?]", text)
def synthesizeAudio(text):
options = SpeakOptions(
model="aura-asteria-en",
)
response = deepgram.speak.rest.v("1").stream(filename, SPEAK_OPTIONS, options)
if response.status_code == 200:
return response.content
else:
raise Exception('Error generating audio')
def main():
segments = segmentTextBySentence(inputText)
for i, segment in enumerate(segments):
try:
audioData = synthesizeAudio(segment)
with open(f"output_{{i}}.wav", 'wb') as f:
f.write(audioData)
print(f"Audio stream finished for segment: {{segment}}")
except Exception as error:
print(f"Error synthesizing audio: {{error}}")
print("Audio file creation completed.")
main()
Where to Find Additional Examples
The SDK repository has a good collection of text-to-speech examples. The README contains links to them. Each example below attempts to provide different options for transcribing an audio source.
Some Examples:
- Threaded Client speaking "Hello World" - examples/text-to-speech/rest/file/hello_world
If the Async Client suits your use case better:
- Threaded Client speaking "Hello World" - examples/text-to-speech/rest/file/async_hello_world
Updated about 2 months ago