For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
  • Text-to-Speech REST
    • Getting Started
    • Feature Overview
    • Template Apps
  • Text-to-Speech Streaming
    • Getting Started
    • Feature Overview
    • Template Apps
  • Models and Languages
    • Voices and Languages
    • TTS Voice Controls
  • Media Output Settings
    • Encoding
    • Bit Rate
    • Container
    • Sample Rate
  • Results Processing
    • TTS Tagging
    • TTS Callback
    • Audio Output Streaming
  • Tips and Tricks
    • Real-Time TTS with WebSockets
    • Text Chunking for TTS
    • Formatting text for Aura-2
    • Handling Audio Issues in Text To Speech
    • Sending LLM Outputs to a WebSocket
    • Text Chunking for TTS REST Optimization
    • Text to Speech Latency
    • Text to Speech Prompting
    • TTS Troubleshooting WebSocket, NET, and DATA Errors
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Availability
  • Speed control
  • Parameters
  • Example request
  • Speed values
  • Pronunciation control
  • Syntax
  • Example request
  • Common use cases
  • Sourcing IPA transcriptions
  • Validation rules
  • Combining controls
  • Healthcare example
  • Brand consistency example
  • IPA reference
  • Vowels (American English)
  • Consonants
  • Stress markers
  • Billing
  • Response headers
  • Error handling
  • Speed out of range
  • Invalid pronunciation
  • Limits
Models and Languages

TTS Voice Controls

Adjust speaking speed and override pronunciation for specific words using Aura-2 controls.

Was this page helpful?
Previous

Media Output Settings

Deepgram provides support for generating audio output in various formats, each with specific encoding options.
Next
Built with

Aura-2 Controls enable fine-grained adjustments to speech output, allowing you to modify speaking speed and override pronunciation for specific words. These controls are designed for enterprise use cases requiring precise voice quality for industry-specific terminology, brand names, and complex content.

Availability

ControlRESTWebSocketLanguages
Speed controlYesYesEnglish (en), Spanish (es)
Pronunciation controlYesYesEnglish (en), Spanish (es)

Speed control

Adjust the speaking rate of generated audio. Speed control modifies the pace of speech while maintaining natural prosody and voice quality.

Parameters

ParameterLocationTypeDefaultRangeDescription
speedqueryfloat1.00.7 - 1.5Speaking rate multiplier

For Spanish voices, the recommended speed range is 0.9 - 1.5. Values below 0.9 may introduce disfluencies.

Example request

$curl --request POST \
> --header "Content-Type: application/json" \
> --header "Authorization: Token DEEPGRAM_API_KEY" \
> --output your_output_file.mp3 \
> --data '{"text":"Hello, how can I help you today?"}' \
> --url "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&speed=0.9"

Speed values

ValueEffectUse Case
0.730% slowerLanguage learning, accessibility, legal compliance
0.820% slowerComplex instructions, elderly users
0.910% slowerClear explanations, training content
1.0Normal speedDefault conversational pace
1.110% fasterEfficient notifications
1.220% fasterQuick alerts, time-sensitive content
1.550% fasterRapid playback, content preview

Speed values outside the 0.7x–1.5x range will return an error.

Pronunciation control

Override the default pronunciation of specific words using International Phonetic Alphabet (IPA) notation.

Syntax

Pronunciation overrides are specified inline within the text using escaped JSON objects:

\{"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"\}

Where:

  • word is the original text (used for billing and display)
  • pronounce is the IPA phonetic transcription
  • Curly braces must be escaped with backslashes (\{ and \})

Example request

$curl -X POST "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&speed=0.8" \
> -H "Authorization: token DEEPGRAM_API_KEY" \
> -H "Content-Type: application/json" \
> --output your_output_file.mp3 \
> -d '{"text": "Take \\{\"word\": \"Azathioprine\", \"pronounce\": \"æzəˈθaɪəpriːn\"\\} twice daily with \\{\"word\": \"dupilumab\", \"pronounce\": \"duːˈpɪljuːmæb\"\\}."}'

The curly braces must be escaped with \\{ and \\} in the cURL command.

Common use cases

CategoryWordIPASpoken As
Medicaldupilumabduːˈpɪljuːmæb”doo-PIL-yoo-mab”
Medicalazathioprineæzəˈθaɪəpriːn”az-uh-THIGH-oh-preen”
BrandHermèsɛərˈmɛz”air-MEZ”
Personal nameNguyenˈwɪn”win”
TechnicalSQLˈsiːkwəl”sequel”

Sourcing IPA transcriptions

A few rules of thumb for producing IPA for your own vocabulary:

  • Short lists (<20 words): generate with an LLM and validate by ear.
  • Longer lists: use authoritative dictionaries that publish IPA directly:
    • Cambridge Dictionary
    • Collins Dictionary
    • Oxford English Dictionary

Best practices:

  • Always validate by ear. IPA that looks correct on the page can still sound off when synthesized — listen to the output before shipping.
  • Match the dialect. UK and US pronunciations differ (e.g., schedule, aluminum). Make sure the IPA you choose matches the voice and audience you’re targeting.

Validation rules

RuleLimit
Max pronunciations per request500
Max IPA string length128 characters
IPA length ratioCannot exceed 10x the source word length (min floor = 15)
Max input text length2000 characters

Combining controls

Speed and pronunciation controls can be used together in the same request.

Healthcare example

1from deepgram import DeepgramClient
2from deepgram.core.request_options import RequestOptions
3
4client = DeepgramClient(api_key="YOUR_API_KEY")
5
6# Speed control via request_options
7request_opts = RequestOptions(additional_query_parameters={"speed": "0.8"})
8
9# Inline IPA replacements with escaped curly braces
10text = r'Take \{"word": "Azathioprine", "pronounce": "æzəˈθaɪəpriːn"\} twice daily with \{"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"\}.'
11
12response = client.speak.v1.audio.generate(
13 text=text,
14 model="aura-2-thalia-en",
15 encoding="mp3",
16 request_options=request_opts
17)
18
19audio_bytes = b"".join(response)
20with open("medical_instructions.mp3", "wb") as f:
21 f.write(audio_bytes)

Use raw string (r'...') with escaped braces \{ and \} for pronunciation control in Python.

Brand consistency example

1from deepgram import DeepgramClient
2
3client = DeepgramClient(api_key="YOUR_API_KEY")
4
5# Ensure consistent brand pronunciation with escaped braces
6text = 'Visit \\{"word": "Hermès", "pronounce": "ɛərˈmɛz"\\} for the latest collection.'
7
8response = client.speak.v1.audio.generate(
9 text=text,
10 model="aura-2-thalia-en",
11 encoding="mp3"
12)
13
14audio_bytes = b"".join(response)
15with open("brand_pronunciation.mp3", "wb") as f:
16 f.write(audio_bytes)

IPA reference

Vowels (American English)

SymbolExampleAs in
iː/biːt/beat
ɪ/bɪt/bit
eɪ/beɪt/bait
ɛ/bɛt/bet
æ/bæt/bat
ɑː/fɑːðər/father
ɔː/kɔːt/caught
oʊ/boʊt/boat
ʊ/pʊt/put
uː/buːt/boot
ʌ/kʌt/cut
ə/əˈbaʊt/about

Consonants

SymbolExampleAs in
p/pɪn/pin
b/bɪn/bin
t/tɪn/tin
d/dɪn/din
k/kæt/cat
ɡ/ɡɛt/get
f/fɪn/fin
v/væn/van
θ/θɪŋk/think
ð/ðæt/that
s/sɪt/sit
z/zɪp/zip
ʃ/ʃɪp/ship
ʒ/ˈvɪʒən/vision
h/hæt/hat
tʃ/tʃɪp/chip
dʒ/dʒʌmp/jump
m/mæn/man
n/nɛt/net
ŋ/sɪŋ/sing
l/lɛt/let
r/rɛd/red
w/wɪn/win
j/jɛs/yes

Stress markers

SymbolMeaningExample
ˈPrimary stress/ˈæp.əl/ (apple)
ˌSecondary stress/ˌɪn.fərˈmeɪ.ʃən/ (information)

Billing

ControlBilling behavior
SpeedNot billed - adjusting rate doesn’t affect billing
PronunciationBilled by underlying word - IPA input is not billed

Example: Hello, \{"word": "Mr.", "pronounce": "ˈmɪstɚ"\} Bond. is billed as Hello, Mr. Bond. (16 characters)

Response headers

HTTP/1.1 200 OK
content-type: audio/mpeg
dg-request-id: req_xyz789
dg-model-name: aura-2-thalia-en
dg-char-count: 47
dg-pronunciations-applied: 2
dg-speed-used: 0.8
HeaderDescription
dg-pronunciations-appliedNumber of pronunciation overrides applied
dg-speed-usedEffective speaking rate used
dg-pronunciation-warningsNon-fatal warnings for invalid IPA

Error handling

Speed out of range

1{"err_code": "speed_out_of_range", "err_msg": "Speed must be between 0.7 and 1.5"}

Invalid pronunciation

1{"err_code": "pronunciation_invalid", "err_msg": "Invalid IPA notation for 'azathioprine'"}

Limits

LimitValue
Max input text length2000 characters
Speed range0.7 - 1.5
Max pronunciations per request500
Max IPA string length128 characters