TTS Voice Controls

Adjust speaking speed and override pronunciation for specific words using Aura-2 controls.

This feature is currently in Early Access. To request access or leave feedback, contact your Account Executive or reach out to sales@deepgram.com.

Aura-2 Controls enable fine-grained adjustments to speech output, allowing you to modify speaking speed and override pronunciation for specific words. These controls are designed for enterprise use cases requiring precise voice quality for industry-specific terminology, brand names, and complex content.

During Early Access, Aura-2 Controls are available for English voices only via the REST API.

Speed control

Adjust the speaking rate of generated audio. Speed control modifies the pace of speech while maintaining natural prosody and voice quality.

Parameters

ParameterLocationTypeDefaultRangeDescription
speedqueryfloat1.00.7 - 1.5Speaking rate multiplier

Example request

$curl --request POST \
> --header "Content-Type: application/json" \
> --header "Authorization: Token DEEPGRAM_API_KEY" \
> --output your_output_file.mp3 \
> --data '{"text":"Hello, how can I help you today?"}' \
> --url "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&speed=0.9"

Speed values

ValueEffectUse Case
0.730% slowerLanguage learning, accessibility, legal compliance
0.820% slowerComplex instructions, elderly users
0.910% slowerClear explanations, training content
1.0Normal speedDefault conversational pace
1.110% fasterEfficient notifications
1.220% fasterQuick alerts, time-sensitive content
1.550% fasterRapid playback, content preview

The 0.7x–1.5x range maintains natural prosody with minimal disfluencies. Speeds outside this range would introduce artifacts that degrade user experience.

Pronunciation control

Override the default pronunciation of specific words using International Phonetic Alphabet (IPA) notation.

Syntax

Pronunciation overrides are specified inline within the text using escaped JSON objects:

\{"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"\}

Where:

  • word is the original text (used for billing and display)
  • pronounce is the IPA phonetic transcription
  • Curly braces must be escaped with backslashes (\{ and \})

Example request

$curl -X POST "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&speed=0.8" \
> -H "Authorization: token DEEPGRAM_API_KEY" \
> -H "Content-Type: application/json" \
> --output your_output_file.mp3 \
> -d '{"text": "Take \\{\"word\": \"Azathioprine\", \"pronounce\": \"æzəˈθaɪəpriːn\"\\} twice daily with \\{\"word\": \"dupilumab\", \"pronounce\": \"duːˈpɪljuːmæb\"\\}."}'

The curly braces must be escaped with \\{ and \\} in the cURL command.

Common use cases

CategoryWordIPASpoken As
Medicaldupilumabduːˈpɪljuːmæb”doo-PIL-yoo-mab”
Medicalazathioprineæzəˈθaɪəpriːn”az-uh-THIGH-oh-preen”
BrandHermèsɛərˈmɛz”air-MEZ”
Personal nameNguyenˈwɪn”win”
TechnicalSQLˈsiːkwəl”sequel”

Validation rules

RuleLimit
Max pronunciations per request500
Max IPA string length128 characters
IPA length ratioCannot exceed 10x the source word length (min floor = 15)
Max input text length2000 characters

Combining controls

Speed and pronunciation controls can be used together in the same request.

Healthcare example

1from deepgram import DeepgramClient
2from deepgram.core.request_options import RequestOptions
3
4client = DeepgramClient(api_key="YOUR_API_KEY")
5
6# Speed control via request_options
7request_opts = RequestOptions(additional_query_parameters={"speed": "0.8"})
8
9# Inline IPA replacements with escaped curly braces
10text = r'Take \{"word": "Azathioprine", "pronounce": "æzəˈθaɪəpriːn"\} twice daily with \{"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"\}.'
11
12response = client.speak.v1.audio.generate(
13 text=text,
14 model="aura-2-thalia-en",
15 encoding="mp3",
16 request_options=request_opts
17)
18
19audio_bytes = b"".join(response)
20with open("medical_instructions.mp3", "wb") as f:
21 f.write(audio_bytes)

Use raw string (r'...') with escaped braces \{ and \} for pronunciation control in Python.

Brand consistency example

Python
1from deepgram import DeepgramClient
2
3client = DeepgramClient(api_key="YOUR_API_KEY")
4
5# Ensure consistent brand pronunciation with escaped braces
6text = 'Visit \\{"word": "Hermès", "pronounce": "ɛərˈmɛz"\\} for the latest collection.'
7
8response = client.speak.v1.audio.generate(
9 text=text,
10 model="aura-2-thalia-en",
11 encoding="mp3"
12)
13
14audio_bytes = b"".join(response)
15with open("brand_pronunciation.mp3", "wb") as f:
16 f.write(audio_bytes)

IPA reference

Vowels (American English)

SymbolExampleAs in
/biːt/beat
ɪ/bɪt/bit
/beɪt/bait
ɛ/bɛt/bet
æ/bæt/bat
ɑː/fɑːðər/father
ɔː/kɔːt/caught
/boʊt/boat
ʊ/pʊt/put
/buːt/boot
ʌ/kʌt/cut
ə/əˈbaʊt/about

Consonants

SymbolExampleAs in
θ/θɪŋk/think
ð/ðæt/that
ʃ/ʃɪp/ship
ʒ/ˈvɪʒən/vision
/tʃɪp/chip
/dʒʌmp/jump
ŋ/sɪŋ/sing

Stress markers

SymbolMeaningExample
ˈPrimary stress/ˈæp.əl/ (apple)
ˌSecondary stress/ˌɪn.fərˈmeɪ.ʃən/ (information)

Billing

ControlBilling behavior
SpeedNot billed - adjusting rate doesn’t affect billing
PronunciationBilled by underlying word - IPA input is not billed

Example: Hello, \{"word": "Mr.", "pronounce": "ˈmɪstɚ"\} Bond. is billed as Hello, Mr. Bond. (16 characters)

Response headers

HTTP/1.1 200 OK
content-type: audio/mpeg
dg-request-id: req_xyz789
dg-model-name: aura-2-thalia-en
dg-char-count: 47
dg-pronunciations-applied: 2
dg-speed-used: 0.8
HeaderDescription
dg-pronunciations-appliedNumber of pronunciation overrides applied
dg-speed-usedEffective speaking rate used
dg-pronunciation-warningsNon-fatal warnings for invalid IPA

Error handling

Speed out of range

1{"err_code": "speed_out_of_range", "err_msg": "Speed must be between 0.7 and 1.5"}

Invalid pronunciation

1{"err_code": "pronunciation_invalid", "err_msg": "Invalid IPA notation for 'azathioprine'"}

Limits

LimitValue
Max input text length2000 characters
Speed range0.7 - 1.5
Max pronunciations per request500
Max IPA string length128 characters

Early access scope

FeatureREST APIWebSocketLanguages
Speed controlYesComing soonEnglish (en)
Pronunciation controlYesComing soonEnglish (en)

Pause control and WebSocket support are planned for future releases.