Getting Started

An introduction to using Deepgram's Aura Text-to-Speech API to convert text into audio.

This guide will walk you through how to turn text into speech with Deepgram's text-to-speech API.

🌈

We'd love to get your feedback on Deepgram's Aura text-to-speech. You will receive $50 in additional console credits within two weeks after filling out the form, and you may be invited to join a group of users with access to the latest private releases. To fill out the form, Click Here.

📘

Before you start, you'll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

CURL

First, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY and then run the following example in a terminal or your favorite API client.

curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"Hello, how can I help you today?"}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

This will result in an MP3 audio file being streamed back to you by Deepgram. You can play the audio as soon as you receive the first byte, or you can wait until the entire MP3 file has arrived.

The audio file will contain the voice of the selected model saying the words that you sent in your request.

📘

If you do not specify a model, the default voice model aura-asteria-en will be used. You can find all of our available voices here.

Send Error Messages to Terminal

If your request results in an error, the error message can be seen by opening the output audio file in a text editor.

To see the error message in your terminal, add this to your CURL request:

--fail-with-body \
--silent \
|| (jq . your_output_file.mp3 && rm your_output_file.mp3)

This example will capture the error message using jq and remove the output file (tts.mp3) automatically.

curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"Hello, how can I help you today?"}' \
     --url 'https://api.deepgram.com/v1/speak?model=testing_error' \
     --fail-with-body \
     --silent \
     || (jq . your_output_file.mp3 && rm your_output_file.mp3)

SDKs

Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram TTS request.

Install the SDK

Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.

# Install the Deepgram JS SDK
# https://github.com/deepgram/deepgram-js-sdk

npm install @deepgram/sdk
# Install the Deepgram Python SDK
# https://github.com/deepgram/deepgram-python-sdk

pip install deepgram-sdk
# Install the Deepgram Go SDK
# https://github.com/deepgram/deepgram-go-sdk

go get github.com/deepgram/deepgram-go-sdk
dotnet add package Deepgram

Add Dependencies

# Install dotenv to protect your api key

npm install dotenv
# Install python-dotenv to protect your api key

pip install python-dotenv
# Importing the Deepgram Go SDK should pull in all dependencies required
# Importing the Deepgram Go SDK should pull in all dependencies required

Make the Request with the SDK

const { createClient } = require("@deepgram/sdk");
const fs = require("fs");

// STEP 1: Create a Deepgram client with your API key
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

const text = "Hello, how can I help you today?";

const getAudio = async () => {
  // STEP 2: Make a request and configure the request with options (such as model choice, audio configuration, etc.)
  const response = await deepgram.speak.request(
    { text },
    {
      model: "aura-asteria-en",
      encoding: "linear16",
      container: "wav",
    }
  );
  // STEP 3: Get the audio stream and headers from the response
  const stream = await response.getStream();
  const headers = await response.getHeaders();
  if (stream) {
    // STEP 4: Convert the stream to an audio buffer
    const buffer = await getAudioBuffer(stream);
    // STEP 5: Write the audio buffer to a file
    fs.writeFile("output.wav", buffer, (err) => {
      if (err) {
        console.error("Error writing audio to file:", err);
      } else {
        console.log("Audio file written to output.wav");
      }
    });
  } else {
    console.error("Error generating audio:", stream);
  }

  if (headers) {
    console.log("Headers:", headers);
  }
};

// helper function to convert stream to audio buffer
const getAudioBuffer = async (response) => {
  const reader = response.getReader();
  const chunks = [];

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    chunks.push(value);
  }

  const dataArray = chunks.reduce(
    (acc, chunk) => Uint8Array.from([...acc, ...chunk]),
    new Uint8Array(0)
  );

  return Buffer.from(dataArray.buffer);
};

getAudio();
import os
from dotenv import load_dotenv

from deepgram import (
    DeepgramClient,
    SpeakOptions,
)

load_dotenv()

SPEAK_OPTIONS = {"text": "Hello, how can I help you today?"}
filename = "output.wav"


def main():
    try:
        # STEP 1: Create a Deepgram client using the API key from environment variables
        deepgram = DeepgramClient(api_key=os.getenv("DG_API_KEY"))

        # STEP 2: Configure the options (such as model choice, audio configuration, etc.)
        options = SpeakOptions(
            model="aura-asteria-en",
            encoding="linear16",
            container="wav"
        )

        # STEP 3: Call the save method on the speak property
        response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
        print(response.to_json(indent=4))

    except Exception as e:
        print(f"Exception: {e}")


if __name__ == "__main__":
    main()
package main

import (
	"context"
	"encoding/json"
	"fmt"
	"os"

	prettyjson "github.com/hokaccha/go-prettyjson"

	speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
	interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
	client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)

const (
	textToSpeech string = "Hello, how can I help you today?"
	filePath     string = "./output.wav"
)

func main() {
	// STEP 1: init Deepgram client library
	client.InitWithDefault()

	// STEP 2: define context to manage the lifecycle of the request
	ctx := context.Background()

	// STEP 3: define options for the request
	options := interfaces.SpeakOptions{
		Model:     "aura-asteria-en",
		Encoding:  "linear16",
		Container: "wav",
	}

	// STEP 4: create a Deepgram client using default settings
	// NOTE: you can set your API KEY in your bash profile by typing the following line in your shell:
	// export DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"
	c := client.NewWithDefaults()
	dg := speak.New(c)

	// STEP 5: send/process file to Deepgram
	res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
	if err != nil {
		fmt.Printf("FromStream failed. Err: %v\n", err)
		os.Exit(1)
	}

	// STEP 6: get the JSON response
	data, err := json.Marshal(res)
	if err != nil {
		fmt.Printf("json.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}

	// STEP 8: make the JSON pretty
	prettyJson, err := prettyjson.Format(data)
	if err != nil {
		fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}
	fmt.Printf("\n\nResult:\n%s\n\n", prettyJson)
}

using System.Text.Json;

using Deepgram;
using Deepgram.Models.Speak.v1;
using Deepgram.Logger;

namespace SampleApp
{
    class Program
    {
        static async Task Main(string[] args)
        {
            // Initialize Library with default logging
            Library.Initialize();

            // Set "DEEPGRAM_API_KEY" environment variable to your Deepgram API Key
            var deepgramClient = new SpeakClient();

            var response = await deepgramClient.ToFile(
                new TextSource("Hello World!"),
                "test.mp3",
                new SpeakSchema()
                {
                    Model = "aura-asteria-en",
                });

            Console.WriteLine(response);
            Console.ReadKey();

            // Teardown Library
            Library.Terminate();
        }
    }
}

📘

To learn more about how you can customize the audio file to meet the needs of your use case, take a look at this Audio Format Combinations table.

Non-SDK Code Examples

If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram's SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs, which we presented in the previous section.

Results

Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.

📘

The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.

Example Response Headers

HTTP/1.1 200 OK
< content-type: audio/mpeg
< dg-model-name: aura-asteria-en
< dg-model-uuid: e4979ab0-8475-4901-9d66-0a562a4949bb
< dg-char-count: 32
< dg-request-id: bf6fc5c7-8f84-479f-b70a-602cf5bf18f3
< transfer-encoding: chunked
< date: Thu, 29 Feb 2024 19:20:48 GMT

📘

To see these response headers when making a CURL request, add -v or --verbose to your request.

This includes:

  • content-type: Specifies the media type of the resource, in this case, audio/mpeg, indicating the format of the audio file returned.
  • dg-request-id: A unique identifier for the request, useful for debugging and tracking purposes.
  • dg-model-uuid: The unique identifier of the model that processed the request.
  • dg-char-count: Indicates the number of characters that were in the input text for the text-to-speech process.
  • dg-model-name: The name of the model used to process the request.
  • transfer-encoding: Specifies the form of encoding used to safely transfer the payload to the recipient.
  • date: The date and time the response was sent.

Limits

Keep these limits in mind when making a Deepgram text-to-speech request.

Input Text Limit

  • Maximum characters: 2000.
  • Sending a text payload longer than 2000 characters (2001 or more) will result in an error, and the audio file will not be created.

Rate Limits

  • Pay As You Go Plan: 480 requests per minute.
  • Growth Plan: 720 requests per minute.

📘

The current rate limit per project is 480 requests per minute for Pay As You Go and 720 requests per minute for Growth plans. Learn more at Deepgram's pricing page or share your feedback on our TTS rate limits.

Handling Rate Limits

  • If the number of in-progress requests for a project meets or exceeds the rate limit, new requests will receive a 429: Too Many Requests error.
  • With a typical response time of 250ms, users of our Pay As You Go or Growth plans can achieve a request rate of roughly:


    2 (concurrent requests) / 0.250 (250 ms response time) * 60 (seconds/minute) = 480 requests per minute.

What's Next?

Now that you've transformed text into speech with Deepgram's API, enhance your knowledge by exploring the following areas.

Starter Apps

  • Clone and run one of our Starter App repositories to see a full application with a frontend UI and a backend server sending text to Deepgram to be converted into audio.

Read the Feature Guides

Deepgram's features help you to customize your request to produce the output that works best for your use case.

  • Media Output Settings: Learn how to customize the audio file that is returned.
  • Callback: Discover how to provide a callback url, so that your audio can be processed asynchronously.
  • Feature Overview: Review the list of features available for pre-recorded speech-to-text. Then, dive into individual guides for more details.

Transcribe Speech-to-Text

Try the Conversational AI Demo

  • The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram and OpenAI ChatGPT.

Watch This Video

  • See how you can use Deepgram Aura with Groq to build a blazing fast Conversational AI application.

What’s Next