TTS Voice Controls | Deepgram's Docs

This feature is currently in Early Access. To request access or leave feedback, contact your Account Executive or reach out to sales@deepgram.com.

Aura-2 Controls enable fine-grained adjustments to speech output, allowing you to modify speaking speed and override pronunciation for specific words. These controls are designed for enterprise use cases requiring precise voice quality for industry-specific terminology, brand names, and complex content.

During Early Access, Aura-2 Controls are available via the REST API only. English voices support speed and pronunciation controls; Spanish voices support speed control only.

Speed control

Adjust the speaking rate of generated audio. Speed control modifies the pace of speech while maintaining natural prosody and voice quality.

Parameters

Parameter	Location	Type	Default	Range	Description
`speed`	query	float	`1.0`	`0.7` - `1.5`	Speaking rate multiplier

For Spanish voices, the recommended speed range is 0.9 - 1.5. Values below 0.9 may introduce disfluencies.

Example request

$ curl --request POST \
>      --header "Content-Type: application/json" \
>      --header "Authorization: Token DEEPGRAM_API_KEY" \
>      --output your_output_file.mp3 \
>      --data '{"text":"Hello, how can I help you today?"}' \
>      --url "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&speed=0.9"

Speed values

Value	Effect	Use Case
`0.7`	30% slower	Language learning, accessibility, legal compliance
`0.8`	20% slower	Complex instructions, elderly users
`0.9`	10% slower	Clear explanations, training content
`1.0`	Normal speed	Default conversational pace
`1.1`	10% faster	Efficient notifications
`1.2`	20% faster	Quick alerts, time-sensitive content
`1.5`	50% faster	Rapid playback, content preview

Speed values outside the 0.7x–1.5x range will return an error.

Pronunciation control

Override the default pronunciation of specific words using International Phonetic Alphabet (IPA) notation.

Syntax

Pronunciation overrides are specified inline within the text using escaped JSON objects:

\{"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"\}

Where:

word is the original text (used for billing and display)
pronounce is the IPA phonetic transcription
Curly braces must be escaped with backslashes (\{ and \})

Example request

$ curl -X POST "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&speed=0.8" \
>      -H "Authorization: token DEEPGRAM_API_KEY" \
>      -H "Content-Type: application/json" \
>      --output your_output_file.mp3 \
>      -d '{"text": "Take \\{\"word\": \"Azathioprine\", \"pronounce\": \"æzəˈθaɪəpriːn\"\\} twice daily with \\{\"word\": \"dupilumab\", \"pronounce\": \"duːˈpɪljuːmæb\"\\}."}'

The curly braces must be escaped with \\{ and \\} in the cURL command.

Common use cases

Category	Word	IPA	Spoken As
Medical	dupilumab	`duːˈpɪljuːmæb`	”doo-PIL-yoo-mab”
Medical	azathioprine	`æzəˈθaɪəpriːn`	”az-uh-THIGH-oh-preen”
Brand	Hermès	`ɛərˈmɛz`	”air-MEZ”
Personal name	Nguyen	`ˈwɪn`	”win”
Technical	SQL	`ˈsiːkwəl`	”sequel”

Validation rules

Rule	Limit
Max pronunciations per request	500
Max IPA string length	128 characters
IPA length ratio	Cannot exceed 10x the source word length (min floor = 15)
Max input text length	2000 characters

Combining controls

Speed and pronunciation controls can be used together in the same request.

Healthcare example

1 from deepgram import DeepgramClient
2 from deepgram.core.request_options import RequestOptions
3 
4 client = DeepgramClient(api_key="YOUR_API_KEY")
5 
6 # Speed control via request_options
7 request_opts = RequestOptions(additional_query_parameters={"speed": "0.8"})
8 
9 # Inline IPA replacements with escaped curly braces
10 text = r'Take \{"word": "Azathioprine", "pronounce": "æzəˈθaɪəpriːn"\} twice daily with \{"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"\}.'
11 
12 response = client.speak.v1.audio.generate(
13     text=text,
14     model="aura-2-thalia-en",
15     encoding="mp3",
16     request_options=request_opts
17 )
18 
19 audio_bytes = b"".join(response)
20 with open("medical_instructions.mp3", "wb") as f:
21     f.write(audio_bytes)

Use raw string (r'...') with escaped braces \{ and \} for pronunciation control in Python.

Brand consistency example

Python

1 from deepgram import DeepgramClient
2 
3 client = DeepgramClient(api_key="YOUR_API_KEY")
4 
5 # Ensure consistent brand pronunciation with escaped braces
6 text = 'Visit \\{"word": "Hermès", "pronounce": "ɛərˈmɛz"\\} for the latest collection.'
7 
8 response = client.speak.v1.audio.generate(
9     text=text,
10     model="aura-2-thalia-en",
11     encoding="mp3"
12 )
13 
14 audio_bytes = b"".join(response)
15 with open("brand_pronunciation.mp3", "wb") as f:
16     f.write(audio_bytes)

IPA reference

Vowels (American English)

Symbol	Example	As in
`iː`	/biːt/	beat
`ɪ`	/bɪt/	bit
`eɪ`	/beɪt/	bait
`ɛ`	/bɛt/	bet
`æ`	/bæt/	bat
`ɑː`	/fɑːðər/	father
`ɔː`	/kɔːt/	caught
`oʊ`	/boʊt/	boat
`ʊ`	/pʊt/	put
`uː`	/buːt/	boot
`ʌ`	/kʌt/	cut
`ə`	/əˈbaʊt/	about

Consonants

Symbol	Example	As in
`θ`	/θɪŋk/	think
`ð`	/ðæt/	that
`ʃ`	/ʃɪp/	ship
`ʒ`	/ˈvɪʒən/	vision
`tʃ`	/tʃɪp/	chip
`dʒ`	/dʒʌmp/	jump
`ŋ`	/sɪŋ/	sing

Stress markers

Symbol	Meaning	Example
`ˈ`	Primary stress	/ˈæp.əl/ (apple)
`ˌ`	Secondary stress	/ˌɪn.fərˈmeɪ.ʃən/ (information)

Billing

Control	Billing behavior
Speed	Not billed - adjusting rate doesn’t affect billing
Pronunciation	Billed by underlying word - IPA input is not billed

Example: Hello, \{"word": "Mr.", "pronounce": "ˈmɪstɚ"\} Bond. is billed as Hello, Mr. Bond. (16 characters)

Response headers

HTTP/1.1 200 OK
content-type: audio/mpeg
dg-request-id: req_xyz789
dg-model-name: aura-2-thalia-en
dg-char-count: 47
dg-pronunciations-applied: 2
dg-speed-used: 0.8

Header	Description
`dg-pronunciations-applied`	Number of pronunciation overrides applied
`dg-speed-used`	Effective speaking rate used
`dg-pronunciation-warnings`	Non-fatal warnings for invalid IPA

Error handling

Speed out of range

1 {"err_code": "speed_out_of_range", "err_msg": "Speed must be between 0.7 and 1.5"}

Invalid pronunciation

1 {"err_code": "pronunciation_invalid", "err_msg": "Invalid IPA notation for 'azathioprine'"}

Limits

Limit	Value
Max input text length	2000 characters
Speed range	0.7 - 1.5
Max pronunciations per request	500
Max IPA string length	128 characters

Early access scope

Feature	REST API	WebSocket	Languages
Speed control	Yes	Coming soon	English (en), Spanish (es)
Pronunciation control	Yes	Coming soon	English (en)

Pause control and WebSocket support are planned for future releases.