Getting Started
An introduction to using Deepgram’s Aura Text-to-Speech REST API to convert text into audio.
This guide will walk you through how to turn text into speech with Deepgram’s text-to-speech REST API.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
API Playground
First, quickly explore Deepgram Text to Speech in our API Playground.
Try this feature out in our API Playground!
CURL
Next, try it with CURL. Add your own API key where it says YOUR_DEEPGRAM_API_KEY
and then run the following example in a terminal or your favorite API client.
This will result in an MP3 audio file being streamed back to you by Deepgram. You can play the audio as soon as you receive the first byte, or you can wait until the entire MP3 file has arrived.
The audio file will contain the voice of the selected model saying the words that you sent in your request.
If you do not specify a model
, the default voice model aura-asteria-en
will be used. You can find all of our available voices here.
Send Error Messages to Terminal
If your request results in an error, the error message can be seen by opening the output audio file in a text editor.
To see the error message in your terminal, add this to your CURL request:
This example will capture the error message using jq
and remove the output file (tts.mp3) automatically.
SDKs
Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram TTS request.
Install the SDK
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
Add Dependencies
Make the Request with the SDK
To learn more about how you can customize the audio file to meet the needs of your use case, take a look at this Audio Format Combinations table.
Non-SDK Code Examples
If you would like to try out making a Deepgram speech-to-text request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs, which we presented in the previous section.
Results
Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.
The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.
Example Response Headers
To see these response headers when making a CURL request, add -v
or --verbose
to your request.
This includes:
content-type
: Specifies the media type of the resource, in this case,audio/mpeg
, indicating the format of the audio file returned.dg-request-id
: A unique identifier for the request, useful for debugging and tracking purposes.dg-model-uuid
: The unique identifier of the model that processed the request.dg-char-count
: Indicates the number of characters that were in the input text for the text-to-speech process.dg-model-name
: The name of the model used to process the request.transfer-encoding
: Specifies the form of encoding used to safely transfer the payload to the recipient.date
: The date and time the response was sent.
Limits
Keep these limits in mind when making a Deepgram text-to-speech request.
Input Text Limit
- Maximum characters: 2000.
- Sending a text payload longer than 2000 characters (2001 or more) will result in an error, and the audio file will not be created.
Rate Limits
- Pay As You Go Plan: 480 requests per minute.
- Growth Plan: 720 requests per minute.
The current rate limit per project is 480 requests per minute for Pay As You Go and 720 requests per minute for Growth plans. Learn more at Deepgram’s pricing page or share your feedback on our TTS rate limits.
Handling Rate Limits
- If the number of in-progress requests for a project meets or exceeds the rate limit, new requests will receive a 429: Too Many Requests error.
- With a typical response time of 250ms, users of our Pay As You Go or Growth plans can achieve a request rate of roughly:
2 (concurrent requests) / 0.250 (250 ms response time) * 60 (seconds/minute) = 480 requests per minute.
What’s Next?
Now that you’ve transformed text into speech with Deepgram’s API, enhance your knowledge by exploring the following areas.
Starter Apps
- Clone and run one of our Starter App repositories to see a full application with a frontend UI and a backend server sending text to Deepgram to be converted into audio.
Read the Feature Guides
Deepgram’s features help you to customize your request to produce the output that works best for your use case.
- Media Output Settings: Learn how to customize the audio file that is returned.
- Callback: Discover how to provide a callback url, so that your audio can be processed asynchronously.
- Feature Overview: Review the list of features available for pre-recorded speech-to-text. Then, dive into individual guides for more details.
Transcribe Speech-to-Text
- Check out how you can use Deepgram to turn audio into text. Read the Pre-Recorded Speech-To-Text guide or the Streaming Speech-To-Text Guide.
Try the Conversational AI Demo
- The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram and OpenAI ChatGPT.
Watch This Video
- See how you can use Deepgram Aura with Groq to build a blazing fast Conversational AI application.
We’d love to get your feedback on Deepgram’s Aura text-to-speech. You will receive $50 in additional console credits within two weeks after filling out the form, and you may be invited to join a group of users with access to the latest private releases. To fill out the form, Click Here.