Getting Started
An introduction to using Deepgram's Aura Text-to-Speech API to convert text into audio.
.This guide will walk you through how to turn text into speech with Deepgram's text-to-speech API.
Before you run the code, we suggest you follow the steps in the Make Your First API Request guide to create a Deepgram account, get a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
We'd love to get your feedback on Deepgram's Aura text-to-speech. You will receive $50 in additional console credits within two weeks after filling out the form, and you may be invited to join a group of users with access to the latest private releases. To fill out the form, Click Here.
CURL
First, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY
and then run the following example in a terminal or your favorite API client.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"Hello, how can I help you today?"}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
This will result in an MP3 audio file being streamed back to you by Deepgram. You can play the audio as soon as you receive the first byte, or you can wait until the entire MP3 file has arrived.
The audio file will contain the voice of the selected model saying the words that you sent in your request.
If you do not specify a
model
, the default voice modelaura-asteria-en
will be used. You can find all of our available voices here.
Send Error Messages to Terminal
If your request results in an error, the error message can be seen by opening the output audio file in a text editor.
To see the error message in your terminal, add this to your CURL request:
--fail-with-body \
--silent \
|| (jq . your_output_file.mp3 && rm your_output_file.mp3)
This example will capture the error message using jq
and remove the output file (tts.mp3) automatically.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"Hello, how can I help you today?"}' \
--url 'https://api.deepgram.com/v1/speak?model=testing_error' \
--fail-with-body \
--silent \
|| (jq . your_output_file.mp3 && rm your_output_file.mp3)
Language-Specific Implementations
Now try out making a Deepgram text-to-speech request with one of these languages. Be sure to install any required dependencies.
Don't see your favorite language among the choices below? Check out our code samples Github repo, which contains examples in many different languages.
import requests
# Define the API endpoint
url = "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
# Set your Deepgram API key
api_key = "DEEPGRAM_API_KEY"
# Define the headers
headers = {
"Authorization": f"Token {api_key}",
"Content-Type": "application/json"
}
# Define the payload
payload = {
"text": "Hello, how can I help you today?"
}
# Make the POST request
response = requests.post(url, headers=headers, json=payload)
# Check if the request was successful
if response.status_code == 200:
# Save the response content to a file
with open("your_output_file.mp3", "wb") as f:
f.write(response.content)
print("File saved successfully.")
else:
print(f"Error: {response.status_code} - {response.text}")
const https = require("https");
const fs = require("fs");
const url = "https://api.deepgram.com/v1/speak?model=aura-asteria-en";
// Set your Deepgram API key
const apiKey = "DEEPGRAM_API_KEY";
// Define the payload
const data = JSON.stringify({
text: "Hello, how can I help you today?",
});
// Define the options for the HTTP request
const options = {
method: "POST",
headers: {
Authorization: `Token ${apiKey}`,
"Content-Type": "application/json",
},
};
// Make the POST request
const req = https.request(url, options, (res) => {
// Check if the response is successful
if (res.statusCode !== 200) {
console.error(`HTTP error! Status: ${res.statusCode}`);
return;
}
// Save the response content to a file
const dest = fs.createWriteStream("output.mp3");
res.pipe(dest);
dest.on("finish", () => {
console.log("File saved successfully.");
});
});
// Handle potential errors
req.on("error", (error) => {
console.error("Error:", error);
});
// Send the request with the payload
req.write(data);
req.end();
package main
import (
"fmt"
"io"
"net/http"
"os"
"strings"
)
func main() {
// Define the API endpoint
url := "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
// Set your Deepgram API key
apiKey := "DEEPGRAM_API_KEY"
// Define the request body
payload := strings.NewReader(`{"text": "Hello, how can I help you today?"}`)
// Create a new HTTP request
client := &http.Client{}
req, err := http.NewRequest("POST", url, payload)
if err != nil {
fmt.Println("Error creating request:", err)
return
}
// Set the request headers
req.Header.Set("Authorization", "Token "+apiKey)
req.Header.Set("Content-Type", "application/json")
// Make the HTTP request
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error making request:", err)
return
}
defer resp.Body.Close()
// Check if the response status code is OK
if resp.StatusCode != http.StatusOK {
fmt.Printf("HTTP error! Status: %d\n", resp.StatusCode)
return
}
// Create a new file to save the response body
outputFile, err := os.Create("your_output_file.mp3")
if err != nil {
fmt.Println("Error creating output file:", err)
return
}
defer outputFile.Close()
// Copy the response body to the output file
_, err = io.Copy(outputFile, resp.Body)
if err != nil {
fmt.Println("Error copying response body:", err)
return
}
fmt.Println("File saved successfully.")
}
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.IO;
class Program
{
static async Task Main(string[] args)
{
// Define your JSON object
string json = "{\"text\": \"Hello, how can I help you today?\"}";
// URL to which you want to send the request
string url = "https://api.deepgram.com/v1/speak"; // Replace with your actual endpoint URL
// API Key
string apiKey = "YOUR_DEEPGRAM_API_KEY"; // Replace with your actual API key
// Create an instance of HttpClient
using (HttpClient httpClient = new HttpClient())
{
try
{
// Prepare the HTTP request content
HttpContent content = new StringContent(json, Encoding.UTF8, "application/json");
// Add Authorization header
httpClient.DefaultRequestHeaders.Add("Authorization", "token " + apiKey);
// Send the POST request
HttpResponseMessage response = await httpClient.PostAsync(url, content);
// Check if the request was successful
if (response.IsSuccessStatusCode)
{
// Read and save the response as binary data
using (Stream audioStream = await response.Content.ReadAsStreamAsync())
{
// Specify where you want to save the audio file
string filePath = "your_output_file.mp3";
using (FileStream fileStream = File.Create(filePath))
{
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
// Copy the binary data from the response stream to the file stream
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = await audioStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
writer.Write(buffer, 0, bytesRead);
}
}
}
Console.WriteLine("Audio file saved successfully.");
}
}
else
{
Console.WriteLine("Request failed with status code: " + response.StatusCode);
}
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
}
}
}
}
SDKs
Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram TTS request.
Install the SDK
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
# Install the Deepgram Python SDK
# https://github.com/deepgram/deepgram-python-sdk
pip install deepgram-sdk
# Install the Deepgram JS SDK
# https://github.com/deepgram/deepgram-js-sdk
npm install @deepgram/sdk
# Install the Deepgram Go SDK
# https://github.com/deepgram/deepgram-go-sdk
go get github.com/deepgram/deepgram-go-sdk
Add Dependencies
import os
from dotenv import load_dotenv
from deepgram import (
DeepgramClient,
SpeakOptions,
)
load_dotenv()
SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.wav"
def main():
try:
# STEP 1 Create a Deepgram client using the API key from environment variables
deepgram = DeepgramClient(api_key=os.getenv("DG_API_KEY"))
# STEP 3 Configure the options (such as model choice, audio configuration, etc.)
options = SpeakOptions(
model="aura-asteria-en",
encoding="linear16",
container="wav"
)
# STEP 2 Call the save method on the speak property
response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
print(response.to_json(indent=4))
except Exception as e:
print(f"Exception: {e}")
if __name__ == "__main__":
main()
# Install dotenv to protect your api key
npm install dotenv
# Importing the Deepgram Go SDK should pull in all dependencies required
Make the Request with the SDK
import os
from dotenv import load_dotenv
from deepgram import (
DeepgramClient,
SpeakOptions,
)
load_dotenv()
SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.wav"
def main():
try:
# STEP 1: Create a Deepgram client using the API key from environment variables
deepgram = DeepgramClient(api_key=os.getenv("DG_API_KEY"))
# STEP 2: Configure the options (such as model choice, audio configuration, etc.)
options = SpeakOptions(
model="aura-asteria-en",
encoding="linear16",
container="wav"
)
# STEP 3: Call the save method on the speak property
response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
print(response.to_json(indent=4))
except Exception as e:
print(f"Exception: {e}")
if __name__ == "__main__":
main()
const { createClient } = require("@deepgram/sdk");
const fs = require("fs");
// STEP 1: Create a Deepgram client with your API key
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
const text = "Hello, how can I help you today?";
const getAudio = async () => {
// STEP 2: Make a request and configure the request with options (such as model choice, audio configuration, etc.)
const response = await deepgram.speak.request(
{ text },
{
model: "aura-asteria-en",
encoding: "linear16",
container: "wav",
}
);
// STEP 3: Get the audio stream and headers from the response
const stream = await response.getStream();
const headers = await response.getHeaders();
if (stream) {
// STEP 4: Convert the stream to an audio buffer
const buffer = await getAudioBuffer(stream);
// STEP 5: Write the audio buffer to a file
fs.writeFile("output.wav", buffer, (err) => {
if (err) {
console.error("Error writing audio to file:", err);
} else {
console.log("Audio file written to output.wav");
}
});
} else {
console.error("Error generating audio:", stream);
}
if (headers) {
console.log("Headers:", headers);
}
};
// helper function to convert stream to audio buffer
const getAudioBuffer = async (response) => {
const reader = response.getReader();
const chunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const dataArray = chunks.reduce(
(acc, chunk) => Uint8Array.from([...acc, ...chunk]),
new Uint8Array(0)
);
return Buffer.from(dataArray.buffer);
};
getAudio();
package main
import (
"context"
"encoding/json"
"fmt"
"os"
prettyjson "github.com/hokaccha/go-prettyjson"
speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)
const (
textToSpeech string = "Hello, how can I help you today?"
filePath string = "./output.wav"
)
func main() {
// STEP 1: init Deepgram client library
client.InitWithDefault()
// STEP 2: define context to manage the lifecycle of the request
ctx := context.Background()
// STEP 3: define options for the request
options := interfaces.SpeakOptions{
Model: "aura-asteria-en",
Encoding: "linear16",
Container: "wav",
}
// STEP 4: create a Deepgram client using default settings
// NOTE: you can set your API KEY in your bash profile by typing the following line in your shell:
// export DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"
c := client.NewWithDefaults()
dg := speak.New(c)
// STEP 5: send/process file to Deepgram
res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
if err != nil {
fmt.Printf("FromStream failed. Err: %v\n", err)
os.Exit(1)
}
// STEP 6: get the JSON response
data, err := json.Marshal(res)
if err != nil {
fmt.Printf("json.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
// STEP 8: make the JSON pretty
prettyJson, err := prettyjson.Format(data)
if err != nil {
fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
fmt.Printf("\n\nResult:\n%s\n\n", prettyJson)
}
To learn more about how you can customize the audio file to meet the needs of your use case, take a look at this Audio Format Combinations table.
Results
Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.
The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.
Example Response Headers
HTTP/1.1 200 OK
< content-type: audio/mpeg
< dg-model-name: aura-asteria-en
< dg-model-uuid: e4979ab0-8475-4901-9d66-0a562a4949bb
< dg-char-count: 32
< dg-request-id: bf6fc5c7-8f84-479f-b70a-602cf5bf18f3
< transfer-encoding: chunked
< date: Thu, 29 Feb 2024 19:20:48 GMT
To see these response headers when making a CURL request, add
-v
or--verbose
to your request.
This includes:
content-type
: Specifies the media type of the resource, in this case,audio/mpeg
, indicating the format of the audio file returned.dg-request-id
: A unique identifier for the request, useful for debugging and tracking purposes.dg-model-uuid
: The unique identifier of the model that processed the request.dg-char-count
: Indicates the number of characters that were in the input text for the text-to-speech process.dg-model-name
: The name of the model used to process the request.transfer-encoding
: Specifies the form of encoding used to safely transfer the payload to the recipient.date
: The date and time the response was sent.
Limits
Keep these limits in mind when making a Deepgram text-to-speech request.
Input Text Limit
The input limit is currently 2000 characters. This means that if the length of the string sent as the text payload is 2001 characters or more, you will receive an error and the audio file will not be created.
Rate Limits
The current rate limit per project is 480 requests per minute for Pay As You Go and Growth plans. Learn more at Deepgram's pricing page or share your feedback on our TTS rate limits.
If the number of in progress requests for a given project is equal to or greater than the concurrency limit, a new request will receive a 429 response. With a typical response time of 250ms, users of our Pay As You Go or Growth plans can achieve a request rate of roughly:
2 (concurrent requests) / 0.250 (250 ms response time) * 60 (seconds/minute) = 480 requests per minute.
With 480 requests per minute, if your conversational AI has 4 TTS requests per conversation, that can support approximately 120 concurrent conversations in a given minute.
What's Next?
Now that you've transformed text into speech with Deepgram's API, enhance your knowledge by exploring the following areas.
Read the Feature Guides
Deepgram's features help you customize your request to produce the output that works best for your use case. Would you like to customize the audio file that is returned? Read the Media Output Settings guide. Do you want to be able to provide a callback url, so that your audio can be processed asynchronously? Check out the Callback guide.
Take a glance at our Feature Overview for text-to-speech to see the list of all the features available. Then read more about each feature in its individual guide.
Transcribe Speech-to-Text
Check out how you can use Deepgram to turn audio into text. Read the Pre-Recorded speech-to-text guide or the Streaming speech-to-text Guide.
Try the Conversational AI Demo
The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram and OpenAI ChatGPT.
Watch This Video
See how you can use Deepgram Aura with Groq to build a blazing fast Conversational AI application.
Updated 7 days ago