May 4, 2026

Aura-2 Voice Controls — Speed and Pronunciation

Aura-2 TTS voices now support runtime speed and pronunciation controls in English and Spanish, available on both batch and streaming WebSocket endpoints. Together, these controls give developers finer-grained tools to improve naturalness — tuning pacing and correcting pronunciation to match their needs.

Speed control adjusts the speaking rate of generated audio while maintaining natural prosody, with a supported range of 0.7x–1.5x. For Spanish voices, the recommended range is 0.9x–1.5x; values below 0.9x may introduce disfluencies.

Pronunciation control overrides the default pronunciation of specific words using inline IPA notation, e.g. The patient was prescribed {"word": "dupilumab", "pronounce": "duːˈpɪljuːmæb"}.

See Voice Controls for details.

Voice Agent

These controls are also supported when using Aura-2 inside the Voice Agent API: set speed once at the session level, and have the LLM emit pronunciation overrides inline. See Voice Agent TTS Controls for the recommended setup.

Self-Hosted

These controls are powered by an updated Aura-2 voice-pack model. Self-hosted deployments using a voice-pack from before the April 2026 release will return 400 Bad Request on requests including speed or pronounce parameters. See the April 2026 self-hosted release notes for voice-pack version requirements and upgrade instructions.