Getting Started

An introduction to using Deepgram's Aura Text-to-Speech API to convert text into audio.

This guide will walk you through how to turn text into speech with Deepgram's text-to-speech API.

Before you run the code, we suggest you follow the steps in the Make Your First API Request guide to create a Deepgram account, get a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

🌈

We'd love to get your feedback on Deepgram's Aura text-to-speech. You will receive $50 in additional console credits within two weeks after filling out the form, and you may be invited to join a group of users with access to the latest private releases. To fill out the form, Click Here.


CURL

First, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY and then run the following example in a terminal or your favorite API client.

curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"Hello, how can I help you today?"}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

This will result in an MP3 audio file being streamed back to you by Deepgram. You can play the audio as soon as you receive the first byte, or you can wait until the entire MP3 file has arrived.

The audio file will contain the voice of the selected model saying the words that you sent in your request.

📘

If you do not specify a model, the default voice model aura-asteria-en will be used. You can find all of our available voices here.

Send Error Messages to Terminal

If your request results in an error, the error message can be seen by opening the output audio file in a text editor.

To see the error message in your terminal, add this to your CURL request:

--fail-with-body \
--silent \
|| (jq . your_output_file.mp3 && rm your_output_file.mp3)

This example will capture the error message using jq and remove the output file (tts.mp3) automatically.

curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"Hello, how can I help you today?"}' \
     --url 'https://api.deepgram.com/v1/speak?model=testing_error' \
     --fail-with-body \
     --silent \
     || (jq . your_output_file.mp3 && rm your_output_file.mp3)

Language-Specific Implementations

Now try out making a Deepgram text-to-speech request with one of these languages. Be sure to install any required dependencies.

📘

Don't see your favorite language among the choices below? Check out our code samples Github repo, which contains examples in many different languages.

import requests

# Define the API endpoint
url = "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

# Set your Deepgram API key
api_key = "DEEPGRAM_API_KEY"

# Define the headers
headers = {
    "Authorization": f"Token {api_key}",
    "Content-Type": "application/json"
}

# Define the payload
payload = {
    "text": "Hello, how can I help you today?"
}

# Make the POST request
response = requests.post(url, headers=headers, json=payload)

# Check if the request was successful
if response.status_code == 200:
    # Save the response content to a file
    with open("your_output_file.mp3", "wb") as f:
        f.write(response.content)
    print("File saved successfully.")
else:
    print(f"Error: {response.status_code} - {response.text}")

const https = require("https");
const fs = require("fs");

const url = "https://api.deepgram.com/v1/speak?model=aura-asteria-en";

// Set your Deepgram API key
const apiKey = "DEEPGRAM_API_KEY";

// Define the payload
const data = JSON.stringify({
  text: "Hello, how can I help you today?",
});

// Define the options for the HTTP request
const options = {
  method: "POST",
  headers: {
    Authorization: `Token ${apiKey}`,
    "Content-Type": "application/json",
  },
};

// Make the POST request
const req = https.request(url, options, (res) => {
  // Check if the response is successful
  if (res.statusCode !== 200) {
    console.error(`HTTP error! Status: ${res.statusCode}`);
    return;
  }
  // Save the response content to a file
  const dest = fs.createWriteStream("output.mp3");
  res.pipe(dest);
  dest.on("finish", () => {
    console.log("File saved successfully.");
  });
});

// Handle potential errors
req.on("error", (error) => {
  console.error("Error:", error);
});

// Send the request with the payload
req.write(data);
req.end();

package main

import (
	"fmt"
	"io"
	"net/http"
	"os"
	"strings"
)

func main() {
	// Define the API endpoint
	url := "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

	// Set your Deepgram API key
	apiKey := "DEEPGRAM_API_KEY"

	// Define the request body
	payload := strings.NewReader(`{"text": "Hello, how can I help you today?"}`)

	// Create a new HTTP request
	client := &http.Client{}
	req, err := http.NewRequest("POST", url, payload)
	if err != nil {
		fmt.Println("Error creating request:", err)
		return
	}

	// Set the request headers
	req.Header.Set("Authorization", "Token "+apiKey)
	req.Header.Set("Content-Type", "application/json")

	// Make the HTTP request
	resp, err := client.Do(req)
	if err != nil {
		fmt.Println("Error making request:", err)
		return
	}
	defer resp.Body.Close()

	// Check if the response status code is OK
	if resp.StatusCode != http.StatusOK {
		fmt.Printf("HTTP error! Status: %d\n", resp.StatusCode)
		return
	}

	// Create a new file to save the response body
	outputFile, err := os.Create("your_output_file.mp3")
	if err != nil {
		fmt.Println("Error creating output file:", err)
		return
	}
	defer outputFile.Close()

	// Copy the response body to the output file
	_, err = io.Copy(outputFile, resp.Body)
	if err != nil {
		fmt.Println("Error copying response body:", err)
		return
	}

	fmt.Println("File saved successfully.")
}

using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.IO;

class Program
{
    static async Task Main(string[] args)
    {
        // Define your JSON object
        string json = "{\"text\": \"Hello, how can I help you today?\"}";

        // URL to which you want to send the request
        string url = "https://api.deepgram.com/v1/speak"; // Replace with your actual endpoint URL

        // API Key
        string apiKey = "YOUR_DEEPGRAM_API_KEY"; // Replace with your actual API key

        // Create an instance of HttpClient
        using (HttpClient httpClient = new HttpClient())
        {
            try
            {
                // Prepare the HTTP request content
                HttpContent content = new StringContent(json, Encoding.UTF8, "application/json");

                // Add Authorization header
                httpClient.DefaultRequestHeaders.Add("Authorization", "token " + apiKey);

                // Send the POST request
                HttpResponseMessage response = await httpClient.PostAsync(url, content);

                // Check if the request was successful
                if (response.IsSuccessStatusCode)
                {
                    // Read and save the response as binary data
                    using (Stream audioStream = await response.Content.ReadAsStreamAsync())
                    {
                        // Specify where you want to save the audio file
                        string filePath = "your_output_file.mp3";
                        using (FileStream fileStream = File.Create(filePath))
                        {
                            using (BinaryWriter writer = new BinaryWriter(fileStream))
                            {
                                // Copy the binary data from the response stream to the file stream
                                byte[] buffer = new byte[8192];
                                int bytesRead;
                                while ((bytesRead = await audioStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
                                {
                                    writer.Write(buffer, 0, bytesRead);
                                }
                            }
                        }
                        Console.WriteLine("Audio file saved successfully.");
                    }
                }
                else
                {
                    Console.WriteLine("Request failed with status code: " + response.StatusCode);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("Error: " + ex.Message);
            }
        }
    }
}

SDKs

Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram TTS request.

Install the SDK

Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.

# Install the Deepgram Python SDK
# https://github.com/deepgram/deepgram-python-sdk

pip install deepgram-sdk
# Install the Deepgram JS SDK
# https://github.com/deepgram/deepgram-js-sdk

npm install @deepgram/sdk
# Install the Deepgram Go SDK
# https://github.com/deepgram/deepgram-go-sdk

go get github.com/deepgram/deepgram-go-sdk

Add Dependencies

import os
from dotenv import load_dotenv

from deepgram import (
    DeepgramClient,
    SpeakOptions,
)

load_dotenv()

SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.wav"


def main():
    try:
        # STEP 1 Create a Deepgram client using the API key from environment variables
        deepgram = DeepgramClient(api_key=os.getenv("DG_API_KEY"))

        # STEP 3 Configure the options (such as model choice, audio configuration, etc.)
        options = SpeakOptions(
            model="aura-asteria-en",
            encoding="linear16",
            container="wav"
        )

        # STEP 2 Call the save method on the speak property
        response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
        print(response.to_json(indent=4))

    except Exception as e:
        print(f"Exception: {e}")


if __name__ == "__main__":
    main()
# Install dotenv to protect your api key

npm install dotenv
# Importing the Deepgram Go SDK should pull in all dependencies required

Make the Request with the SDK

import os
from dotenv import load_dotenv

from deepgram import (
    DeepgramClient,
    SpeakOptions,
)

load_dotenv()

SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.wav"


def main():
    try:
        # STEP 1: Create a Deepgram client using the API key from environment variables
        deepgram = DeepgramClient(api_key=os.getenv("DG_API_KEY"))

        # STEP 2: Configure the options (such as model choice, audio configuration, etc.)
        options = SpeakOptions(
            model="aura-asteria-en",
            encoding="linear16",
            container="wav"
        )

        # STEP 3: Call the save method on the speak property
        response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
        print(response.to_json(indent=4))

    except Exception as e:
        print(f"Exception: {e}")


if __name__ == "__main__":
    main()
const { createClient } = require("@deepgram/sdk");
const fs = require("fs");

// STEP 1: Create a Deepgram client with your API key
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

const text = "Hello, how can I help you today?";

const getAudio = async () => {
  // STEP 2: Make a request and configure the request with options (such as model choice, audio configuration, etc.)
  const response = await deepgram.speak.request(
    { text },
    {
      model: "aura-asteria-en",
      encoding: "linear16",
      container: "wav",
    }
  );
  // STEP 3: Get the audio stream and headers from the response
  const stream = await response.getStream();
  const headers = await response.getHeaders();
  if (stream) {
    // STEP 4: Convert the stream to an audio buffer
    const buffer = await getAudioBuffer(stream);
    // STEP 5: Write the audio buffer to a file
    fs.writeFile("output.wav", buffer, (err) => {
      if (err) {
        console.error("Error writing audio to file:", err);
      } else {
        console.log("Audio file written to output.wav");
      }
    });
  } else {
    console.error("Error generating audio:", stream);
  }

  if (headers) {
    console.log("Headers:", headers);
  }
};

// helper function to convert stream to audio buffer
const getAudioBuffer = async (response) => {
  const reader = response.getReader();
  const chunks = [];

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    chunks.push(value);
  }

  const dataArray = chunks.reduce(
    (acc, chunk) => Uint8Array.from([...acc, ...chunk]),
    new Uint8Array(0)
  );

  return Buffer.from(dataArray.buffer);
};

getAudio();
package main

import (
	"context"
	"encoding/json"
	"fmt"
	"os"

	prettyjson "github.com/hokaccha/go-prettyjson"

	speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
	interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
	client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)

const (
	textToSpeech string = "Hello, how can I help you today?"
	filePath     string = "./output.wav"
)

func main() {
	// STEP 1: init Deepgram client library
	client.InitWithDefault()

	// STEP 2: define context to manage the lifecycle of the request
	ctx := context.Background()

	// STEP 3: define options for the request
	options := interfaces.SpeakOptions{
		Model:     "aura-asteria-en",
		Encoding:  "linear16",
		Container: "wav",
	}

	// STEP 4: create a Deepgram client using default settings
	// NOTE: you can set your API KEY in your bash profile by typing the following line in your shell:
	// export DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"
	c := client.NewWithDefaults()
	dg := speak.New(c)

	// STEP 5: send/process file to Deepgram
	res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
	if err != nil {
		fmt.Printf("FromStream failed. Err: %v\n", err)
		os.Exit(1)
	}

	// STEP 6: get the JSON response
	data, err := json.Marshal(res)
	if err != nil {
		fmt.Printf("json.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}

	// STEP 8: make the JSON pretty
	prettyJson, err := prettyjson.Format(data)
	if err != nil {
		fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}
	fmt.Printf("\n\nResult:\n%s\n\n", prettyJson)
}

📘

To learn more about how you can customize the audio file to meet the needs of your use case, take a look at this Audio Format Combinations table.

Results

Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.

📘

The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.

Example Response Headers

HTTP/1.1 200 OK
< content-type: audio/mpeg
< dg-model-name: aura-asteria-en
< dg-model-uuid: e4979ab0-8475-4901-9d66-0a562a4949bb
< dg-char-count: 32
< dg-request-id: bf6fc5c7-8f84-479f-b70a-602cf5bf18f3
< transfer-encoding: chunked
< date: Thu, 29 Feb 2024 19:20:48 GMT

📘

To see these response headers when making a CURL request, add -v or --verbose to your request.

This includes:

  • content-type: Specifies the media type of the resource, in this case, audio/mpeg, indicating the format of the audio file returned.
  • dg-request-id: A unique identifier for the request, useful for debugging and tracking purposes.
  • dg-model-uuid: The unique identifier of the model that processed the request.
  • dg-char-count: Indicates the number of characters that were in the input text for the text-to-speech process.
  • dg-model-name: The name of the model used to process the request.
  • transfer-encoding: Specifies the form of encoding used to safely transfer the payload to the recipient.
  • date: The date and time the response was sent.

Limits

Keep these limits in mind when making a Deepgram text-to-speech request.

Input Text Limit

The input limit is currently 2000 characters. This means that if the length of the string sent as the text payload is 2001 characters or more, you will receive an error and the audio file will not be created.

Rate Limits

📘

The current rate limit is 2 concurrent requests per project with 480 requests per minute for Pay As You Go and Growth plans. Learn more at Deepgram's pricing page.

If the number of in progress requests for a given project is equal to or greater than the concurrency limit, a new request will receive a 429 response. With a typical response time of 250ms, users of our Pay As You Go or Growth plans can achieve a request rate of roughly:

2 (concurrent requests) / 0.250 (250 ms response time) * 60 (seconds/minute) = 480 requests per minute.

With 480 requests per minute, if your conversational AI has 4 TTS requests per conversation, that can support approximately 120 concurrent conversations in a given minute.

What's Next?

Now that you've transformed text into speech with Deepgram's API, enhance your knowledge by exploring the following areas.

Read the Feature Guides

Deepgram's features help you customize your request to produce the output that works best for your use case. Would you like to customize the audio file that is returned? Read the Media Output Settings guide. Do you want to be able to provide a callback url, so that your audio can be processed asynchronously? Check out the Callback guide.

Take a glance at our Feature Overview for text-to-speech to see the list of all the features available. Then read more about each feature in its individual guide.

Transcribe Speech-to-Text

Check out how you can use Deepgram to turn audio into text. Read the Pre-Recorded speech-to-text guide or the Streaming speech-to-text Guide.

Try the Conversational AI Demo

The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram and OpenAI ChatGPT.

Watch This Video

See how you can use Deepgram Aura with Groq to build a blazing fast Conversational AI application.


What’s Next