Deepgram is proud to announce the release of Aura-2, our text-to-speech model purpose-built for realtime enterprise use cases.
Performance
- Sub-200ms time-to-first-byte (TTFB) latency for real-time conversational interactions
- 0.111x Real-Time Factor (RTF), synthesizing one second of audio in just over 100 milliseconds
Voice Quality & Features
-
Enterprise-optimized voice catalog with 40+ distinct voices, each designed for specific business contexts
-
Tuned for professional and transactional interactions with appropriate tone, pacing, and emphasis
-
Superior pronunciation accuracy for domain-specific content:
- Currency and numerals
- Dates and timestamps in varied formats
- Email addresses, passwords, and URLs
- Complex addresses and location references
-
Industry-leading voice clarity rated higher than competitors in customer service scenarios
Availability
- Aura-2 is available now via REST and Websocket APIs
- Currently available for use through our hosted offering
For detailed information about Aura-2, please refer to our Developer Documentation.