Improving Aura-2 Formatting

Optimize Aura-2 Text-to-Speech Generation Quality

Aura-2 is currently available for the TTS REST API only. Websocket support is coming soon.

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

This guide provides essential formatting techniques to optimize text for Aura-2 text-to-speech conversion. Following these guidelines will produce more natural-sounding speech output with appropriate pacing, intonation, and emphasis.

Note for LLM-Generated Text

If you are using a Large Language Model (LLM) to generate input text for Aura-2, you can prompt the LLM to provide conversational responses. For example, instruct the LLM to “respond in a natural, conversational tone with appropriate punctuation for text-to-speech” to get output that will sound more natural when processed by Aura-2.

Core Principles

PrincipleDo This ✓Not This ✗
End sentences with periodsHello. One moment. I’m looking up your records.Hello One Moment I’m looking up your records
Use question marks for questionsWould you like to add a drink for $1 more?Would you like to add a drink for $1 more
Add exclamation points for enthusiasmThanks for contacting our support team!Thanks for contacting our support team
Use commas for natural pausesYou can reach us by phone, chat, or email.You can reach us by phone chat or email
Put command words in quotesSay “add item” to add more to your order.Say add item to add more to your order

Natural Speech Patterns

  • Direct address: Include commas before names

    • ✓ “Hello, Maria! We have a special offer today.”
    • ✗ “Hello Maria We have a special offer today”
  • Lists: Insert commas between items

    • ✓ “Would you like fries, a drink, or an apple pie?”
    • ✗ “Would you like fries a drink or an apple pie”
  • Conversational flow: Use short, standalone phrases

    • ✓ “One moment. I’m searching for that information.”
    • ✗ Long combined sentences without pauses

Special Formatting

TechniqueExample
Hyphens for additional pausesYour total is $45.82 - please pull forward.
Clear step boundariesPlease arrive early. Bring your insurance card - and medications.

Context Adaptation

  • Professional tone: “We’ve processed your refund according to company policy.”
  • Casual tone: “Good news! We’ve processed your refund - money should be back soon.”

Common Pitfalls

  • ❌ Missing punctuation
  • ❌ Run-on sentences
  • ❌ Inconsistent formatting
  • ❌ Unexplained abbreviations
  • ❌ Overusing emphasis (!!!, ALL CAPS)
  • ❌ No space before ? after URLs/emails
  • ❌ Insufficient pauses for complex information

Testing Your Text

  1. Read your text aloud naturally
  2. Mark where you naturally pause
  3. Add punctuation to match these pauses
  4. Test variations with Aura-2 to find the most natural output

Remember: Natural text input produces natural speech output. The formatting choices you make directly impact how Aura-2 interprets and vocalizes your content.


Built with