Text to Speech Prompting
Prompting text-to-speech with natural pauses or filler words can help to make your audio sound more natural.
This guide introduces specific techniques for directing Deepgram Aura to produce audio output that sounds more like natural speech. Use punctuation and filler words to include intentional pauses in speech and to adjust the rhythm of speech for better pacing and engagement with users.
Pauses
If you need to insert a longer pause in your audio, use the ellipsis: ...
.
A comma (,
) present in your text will be treated as a very short pause.
"Hello, how can I help you today? ... Are you there ... Hello?"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"Hello, how can I help you today? ... Are you there ... Hello?"}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
Filler words
Filler words such as um
and uh
can also be used to offer a more natural sounding audio output.
"Hello, how can I help you today? um Are you there uh Hello?"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"Hello, how can I help you today? um Are you there uh Hello?"}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
Pronunciation
Numbers
Depending on how you want numbers to be spoken in your audio output, consider using the following prompts for number pronunciation.
Explicitly add the word and
to tell the model to pronounce the entire phrase as "twelve hundred and thirty-five". Otherwise, the model will pronounce it as "twelve thirty-five".
"The total is 1235, or twelve hundred and thirty-five."
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"The total is 1235, or twelve hundred and thirty-five."}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
Acronyms
In most cases, acronyms can be handled by just providing the letters of the acronym. Given some acronyms are pronounced as a word (e.g, NASA), while others aren't (e.g, NBA), Aura will attempt to pronounce the acronym correctly in your audio output.
"I love watching NBA Basketball."
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"I love watching NBA Basketball."}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
Words in different languages
If you are having trouble with words rooted in different languages in your text, you can try spelling the word out phonetically and Aura will attempt to pronounce the word correctly in your audio output. However, in some cases Aura can pronounce these words correctly without the phonetic spelling.
"I want to rahn-day-voo with you."
"I want to rendezvous with you."
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Token DEEPGRAM_API_KEY" \
--output your_output_file.mp3 \
--data '{"text":"I want to rahn-day-voo with you."}' \
--url "https://api.deepgram.com/v1/speak?model=aura-asteria-en"
We want your feedback!
Updated 24 days ago