Getting Started
An introduction to using Deepgram’s Aura Text-to-Speech REST API to convert text into audio.
An introduction to using Deepgram’s Aura Text-to-Speech REST API to convert text into audio.
This guide will walk you through how to turn text into speech with Deepgram’s text-to-speech REST API.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
Next, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY and then run the following example in a terminal or your favorite API client.
This will result in an MP3 audio file being streamed back to you by Deepgram. You can play the audio as soon as you receive the first byte, or you can wait until the entire MP3 file has arrived.
The audio file will contain the voice of the selected model saying the words that you sent in your request.
If you do not specify a model, the default voice model aura-asteria-en will be used. You can find all of our available voices here.
If your request results in an error, the error message can be seen by opening the output audio file in a text editor.
To see the error message in your terminal, add this to your CURL request:
This example will capture the error message using the JQ JSON processor library and remove the output file tts.mp3 automatically.
Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram TTS request.
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
To learn more about how you can customize the audio file to meet the needs of your use case, take a look at this Audio Format Combinations table.
If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs, which we presented in the previous section.
Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.
The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.
To see these response headers when making a CURL request, add -v or --verbose to your request.
This includes:
content-type: Specifies the media type of the resource, in this case, audio/mpeg, indicating the format of the audio file returned.dg-request-id: A unique identifier for the request, useful for debugging and tracking purposes.dg-model-uuid: The unique identifier of the model that processed the request.dg-char-count: Indicates the number of characters that were in the input text for the text-to-speech process.dg-model-name: The name of the model used to process the request.transfer-encoding: Specifies the form of encoding used to safely transfer the payload to the recipient.date: The date and time the response was sent.Keep these limits in mind when making a Deepgram text-to-speech request.
Sending a request with a text payload longer than the maximum number of characters can result in a 413: Input Text Exceeds Character Limits error, and the audio file will not be created.
A 422: Unprocessable Content error can be returned if the client fails to send the request successfully.
For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.
If the number of in-progress requests for a project meets or exceeds the rate limit, new requests will receive a 429: Too Many Requests error.
For suggestions on handling Concurrency Rate Limits, refer to our Working with Concurrency Rate Limits Documentation guide.
Now that you’ve transformed text into speech with Deepgram’s API, enhance your knowledge by exploring the following areas.
Deepgram’s features help you to customize your request to produce the output that works best for your use case.