Improving Aura-2 Formatting
Optimize Aura-2 Text-to-Speech Generation Quality
Aura-2 is currently available for the TTS REST API only. Websocket support is coming soon.
Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.
This guide provides essential formatting techniques to optimize text for Aura-2 text-to-speech conversion. Following these guidelines will produce more natural-sounding speech output with appropriate pacing, intonation, and emphasis.
Note for LLM-Generated Text
If you are using a Large Language Model (LLM) to generate input text for Aura-2, you can prompt the LLM to provide conversational responses. For example, instruct the LLM to “respond in a natural, conversational tone with appropriate punctuation for text-to-speech” to get output that will sound more natural when processed by Aura-2.
Core Principles
Natural Speech Patterns
-
Direct address: Include commas before names
- ✓ “Hello, Maria! We have a special offer today.”
- ✗ “Hello Maria We have a special offer today”
-
Lists: Insert commas between items
- ✓ “Would you like fries, a drink, or an apple pie?”
- ✗ “Would you like fries a drink or an apple pie”
-
Conversational flow: Use short, standalone phrases
- ✓ “One moment. I’m searching for that information.”
- ✗ Long combined sentences without pauses
Special Formatting
Context Adaptation
- Professional tone: “We’ve processed your refund according to company policy.”
- Casual tone: “Good news! We’ve processed your refund - money should be back soon.”
Common Pitfalls
- ❌ Missing punctuation
- ❌ Run-on sentences
- ❌ Inconsistent formatting
- ❌ Unexplained abbreviations
- ❌ Overusing emphasis (!!!, ALL CAPS)
- ❌ No space before ? after URLs/emails
- ❌ Insufficient pauses for complex information
Testing Your Text
- Read your text aloud naturally
- Mark where you naturally pause
- Add punctuation to match these pauses
- Test variations with Aura-2 to find the most natural output
Remember: Natural text input produces natural speech output. The formatting choices you make directly impact how Aura-2 interprets and vocalizes your content.