Getting Started
An introduction to using Deepgram's Aura Text-to-Speech REST API to convert text into audio.
This guide will walk you through how to turn text into speech with Deepgram's text-to-speech REST API.
Before you start, you'll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
API Playground
First, quickly explore Deepgram Text to Speech in our API Playground.
Try this feature out in our API Playground!
CURL
Next, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY
and then run the following example in a terminal or your favorite API client.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"Hello, how can I help you today?"}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
This will result in an MP3 audio file being streamed back to you by Deepgram. You can play the audio as soon as you receive the first byte, or you can wait until the entire MP3 file has arrived.
The audio file will contain the voice of the selected model saying the words that you sent in your request.
If you do not specify a
model
, the default voice modelaura-asteria-en
will be used. You can find all of our available voices here.
Send Error Messages to Terminal
If your request results in an error, the error message can be seen by opening the output audio file in a text editor.
To see the error message in your terminal, add this to your CURL request:
--fail-with-body \
--silent \
|| (jq . your_output_file.mp3 && rm your_output_file.mp3)
This example will capture the error message using jq
and remove the output file (tts.mp3) automatically.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"Hello, how can I help you today?"}' \
--url 'https://api.deepgram.com/v1/speak?model=testing_error' \
--fail-with-body \
--silent \
|| (jq . your_output_file.mp3 && rm your_output_file.mp3)
SDKs
Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram TTS request.
Install the SDK
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
# Install the Deepgram JS SDK
# https://github.com/deepgram/deepgram-js-sdk
npm install @deepgram/sdk
# Install the Deepgram Python SDK
# https://github.com/deepgram/deepgram-python-sdk
pip install deepgram-sdk
# Install the Deepgram Go SDK
# https://github.com/deepgram/deepgram-go-sdk
go get github.com/deepgram/deepgram-go-sdk
dotnet add package Deepgram
Add Dependencies
# Install dotenv to protect your api key
npm install dotenv
# Install python-dotenv to protect your api key
pip install python-dotenv
# Importing the Deepgram Go SDK should pull in all dependencies required
# Importing the Deepgram Go SDK should pull in all dependencies required
Make the Request with the SDK
const { createClient } = require("@deepgram/sdk");
const fs = require("fs");
// STEP 1: Create a Deepgram client with your API key
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
const text = "Hello, how can I help you today?";
const getAudio = async () => {
// STEP 2: Make a request and configure the request with options (such as model choice, audio configuration, etc.)
const response = await deepgram.speak.request(
{ text },
{
model: "aura-asteria-en",
encoding: "linear16",
container: "wav",
}
);
// STEP 3: Get the audio stream and headers from the response
const stream = await response.getStream();
const headers = await response.getHeaders();
if (stream) {
// STEP 4: Convert the stream to an audio buffer
const buffer = await getAudioBuffer(stream);
// STEP 5: Write the audio buffer to a file
fs.writeFile("output.wav", buffer, (err) => {
if (err) {
console.error("Error writing audio to file:", err);
} else {
console.log("Audio file written to output.wav");
}
});
} else {
console.error("Error generating audio:", stream);
}
if (headers) {
console.log("Headers:", headers);
}
};
// helper function to convert stream to audio buffer
const getAudioBuffer = async (response) => {
const reader = response.getReader();
const chunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const dataArray = chunks.reduce(
(acc, chunk) => Uint8Array.from([...acc, ...chunk]),
new Uint8Array(0)
);
return Buffer.from(dataArray.buffer);
};
getAudio();
import os
import logging
from deepgram.utils import verboselogs
from deepgram import (
DeepgramClient,
SpeakOptions,
)
SPEAK_TEXT = {"text": "Hello world!"}
filename = "test.mp3"
def main():
try:
# STEP 1 Create a Deepgram client using the API key from environment variables
deepgram = DeepgramClient()
# STEP 2 Call the save method on the speak property
options = SpeakOptions(
model="aura-asteria-en",
)
response = deepgram.speak.rest.v("1").save(filename, SPEAK_TEXT, options)
print(response.to_json(indent=4))
except Exception as e:
print(f"Exception: {e}")
if __name__ == "__main__":
main()
package main
import (
"context"
"encoding/json"
"fmt"
"os"
prettyjson "github.com/hokaccha/go-prettyjson"
speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)
const (
textToSpeech string = "Hello, how can I help you today?"
filePath string = "./output.wav"
)
func main() {
// STEP 1: init Deepgram client library
client.InitWithDefault()
// STEP 2: define context to manage the lifecycle of the request
ctx := context.Background()
// STEP 3: define options for the request
options := interfaces.SpeakOptions{
Model: "aura-asteria-en",
Encoding: "linear16",
Container: "wav",
}
// STEP 4: create a Deepgram client using default settings
// NOTE: you can set your API KEY in your bash profile by typing the following line in your shell:
// export DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"
c := client.NewWithDefaults()
dg := speak.New(c)
// STEP 5: send/process file to Deepgram
res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
if err != nil {
fmt.Printf("FromStream failed. Err: %v\n", err)
os.Exit(1)
}
// STEP 6: get the JSON response
data, err := json.Marshal(res)
if err != nil {
fmt.Printf("json.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
// STEP 8: make the JSON pretty
prettyJson, err := prettyjson.Format(data)
if err != nil {
fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
fmt.Printf("\n\nResult:\n%s\n\n", prettyJson)
}
using Deepgram.Models.Speak.v1.REST;
namespace SampleApp
{
class Program
{
static async Task Main(string[] args)
{
// Initialize Library with default logging
// Normal logging is "Info" level
Library.Initialize();
// use the client factory with a API Key set with the "DEEPGRAM_API_KEY" environment variable
var deepgramClient = ClientFactory.CreateSpeakRESTClient();
var response = await deepgramClient.ToFile(
new TextSource("Hello World!"),
"test.mp3",
new SpeakSchema()
{
Model = "aura-asteria-en",
});
//Console.WriteLine(response);
Console.WriteLine(response);
Console.ReadKey();
// Teardown Library
Library.Terminate();
}
}
}
To learn more about how you can customize the audio file to meet the needs of your use case, take a look at this Audio Format Combinations table.
Non-SDK Code Examples
If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram's SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs, which we presented in the previous section.
Results
Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.
The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.
Example Response Headers
HTTP/1.1 200 OK
< content-type: audio/mpeg
< dg-model-name: aura-asteria-en
< dg-model-uuid: e4979ab0-8475-4901-9d66-0a562a4949bb
< dg-char-count: 32
< dg-request-id: bf6fc5c7-8f84-479f-b70a-602cf5bf18f3
< transfer-encoding: chunked
< date: Thu, 29 Feb 2024 19:20:48 GMT
To see these response headers when making a CURL request, add
-v
or--verbose
to your request.
This includes:
content-type
: Specifies the media type of the resource, in this case,audio/mpeg
, indicating the format of the audio file returned.dg-request-id
: A unique identifier for the request, useful for debugging and tracking purposes.dg-model-uuid
: The unique identifier of the model that processed the request.dg-char-count
: Indicates the number of characters that were in the input text for the text-to-speech process.dg-model-name
: The name of the model used to process the request.transfer-encoding
: Specifies the form of encoding used to safely transfer the payload to the recipient.date
: The date and time the response was sent.
Limits
Keep these limits in mind when making a Deepgram text-to-speech request.
Input Text Limit
- Maximum characters: 2000.
- Sending a text payload longer than 2000 characters (2001 or more) will result in an error, and the audio file will not be created.
Rate Limits
- Pay As You Go Plan: 480 requests per minute.
- Growth Plan: 720 requests per minute.
The current rate limit per project is 480 requests per minute for Pay As You Go and 720 requests per minute for Growth plans. Learn more at Deepgram's pricing page or share your feedback on our TTS rate limits.
Handling Rate Limits
- If the number of in-progress requests for a project meets or exceeds the rate limit, new requests will receive a 429: Too Many Requests error.
- With a typical response time of 250ms, users of our Pay As You Go or Growth plans can achieve a request rate of roughly:
2 (concurrent requests) / 0.250 (250 ms response time) * 60 (seconds/minute) = 480 requests per minute.
What's Next?
Now that you've transformed text into speech with Deepgram's API, enhance your knowledge by exploring the following areas.
Starter Apps
- Clone and run one of our Starter App repositories to see a full application with a frontend UI and a backend server sending text to Deepgram to be converted into audio.
Read the Feature Guides
Deepgram's features help you to customize your request to produce the output that works best for your use case.
- Media Output Settings: Learn how to customize the audio file that is returned.
- Callback: Discover how to provide a callback url, so that your audio can be processed asynchronously.
- Feature Overview: Review the list of features available for pre-recorded speech-to-text. Then, dive into individual guides for more details.
Transcribe Speech-to-Text
- Check out how you can use Deepgram to turn audio into text. Read the Pre-Recorded Speech-To-Text guide or the Streaming Speech-To-Text Guide.
Try the Conversational AI Demo
- The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram and OpenAI ChatGPT.
Watch This Video
- See how you can use Deepgram Aura with Groq to build a blazing fast Conversational AI application.
We'd love to get your feedback on Deepgram's Aura text-to-speech. You will receive $50 in additional console credits within two weeks after filling out the form, and you may be invited to join a group of users with access to the latest private releases. To fill out the form, Click Here.
Updated about 2 months ago