Formatting Text for Aura-2
Optimize Aura-2 Text-to-Speech Generation Quality
Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.
This guide provides essential formatting techniques to optimize text for Aura-2 text-to-speech conversion. Following these guidelines will produce more natural-sounding speech output with appropriate pacing, intonation, and emphasis.
Note for LLM-Generated Text
If you are using a Large Language Model (LLM) to generate input text for Aura-2, you can prompt the LLM to provide conversational responses. For example, instruct the LLM to โrespond in a natural, conversational tone with appropriate punctuation for text-to-speechโ to get output that will sound more natural when processed by Aura-2.
Core Principles
Natural Speech Patterns
-
Direct address: Include commas before names
- โ โHello, Maria! We have a special offer today.โ
- โ โHello Maria We have a special offer todayโ
-
Lists: Insert commas between items
- โ โWould you like fries, a drink, or an apple pie?โ
- โ โWould you like fries a drink or an apple pieโ
-
Conversational flow: Use short, standalone phrases
- โ โOne moment. Iโm searching for that information.โ
- โ Long combined sentences without pauses
Special Formatting
Context Adaptation
- Professional tone: โWeโve processed your refund according to company policy.โ
- Casual tone: โGood news! Weโve processed your refund - money should be back soon.โ
Common Pitfalls
- โ Missing punctuation
- โ Run-on sentences
- โ Inconsistent formatting
- โ Unexplained abbreviations
- โ Overusing emphasis (!!!, ALL CAPS)
- โ No space before ? after URLs/emails
- โ Insufficient pauses for complex information
Testing Your Text
- Read your text aloud naturally
- Mark where you naturally pause
- Add punctuation to match these pauses
- Test variations with Aura-2 to find the most natural output
Remember: Natural text input produces natural speech output. The formatting choices you make directly impact how Aura-2 interprets and vocalizes your content.