Text to Speech Prompting

Prompting text-to-speech with natural pauses or filler words can help to make your audio sound more natural.

This guide introduces specific techniques for directing Deepgram Aura to produce audio output that sounds more like natural speech. Use punctuation and filler words to include intentional pauses in speech and to adjust the rhythm of speech for better pacing and engagement with users.

Pauses

If you need to insert a longer pause in your audio, use the ellipsis: ....

A comma (,) present in your text will be treated as a very short pause.

"Hello, how can I help you today? ... Are you there ... Hello?"
    curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"Hello, how can I help you today? ... Are you there ... Hello?"}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

Filler words

Filler words such as um and uh can also be used to offer a more natural sounding audio output.

"Hello, how can I help you today? um Are you there uh Hello?"
  curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"Hello, how can I help you today? um Are you there uh Hello?"}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

Pronunciation

Numbers

Depending on how you want numbers to be spoken in your audio output, consider using the following prompts for number pronunciation.

Explicitly add the word and to tell the model to pronounce the entire phrase as "twelve hundred and thirty-five". Otherwise, the model will pronounce it as "twelve thirty-five".

"The total is 1235, or twelve hundred and thirty-five."
curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"The total is 1235, or twelve hundred and thirty-five."}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

Acronyms

In most cases, acronyms can be handled by just providing the letters of the acronym. Given some acronyms are pronounced as a word (e.g, NASA), while others aren't (e.g, NBA), Aura will attempt to pronounce the acronym correctly in your audio output.

"I love watching NBA Basketball."
curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"I love watching NBA Basketball."}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

Words in different languages

If you are having trouble with words rooted in different languages in your text, you can try spelling the word out phonetically and Aura will attempt to pronounce the word correctly in your audio output. However, in some cases Aura can pronounce these words correctly without the phonetic spelling.

"I want to rahn-day-voo with you."
"I want to rendezvous with you."
 curl --request POST \
     --header "Content-Type: application/json" \
     --header "Authorization: Token DEEPGRAM_API_KEY" \
     --output your_output_file.mp3 \
     --data '{"text":"I want to rahn-day-voo with you."}' \
     --url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"

🌈

We want your feedback!

Having issues with pronunciation and pauses?


What’s Next