Deepgram is proud to announce the release of Aura-2, our text-to-speech model purpose-built for realtime enterprise use cases.

Performance

  • Sub-200ms time-to-first-byte (TTFB) latency for real-time conversational interactions
  • 0.111x Real-Time Factor (RTF), synthesizing one second of audio in just over 100 milliseconds

Voice Quality & Features

  • Enterprise-optimized voice catalog with 40+ distinct voices, each designed for specific business contexts

  • Tuned for professional and transactional interactions with appropriate tone, pacing, and emphasis

  • Superior pronunciation accuracy for domain-specific content:

    • Currency and numerals
    • Dates and timestamps in varied formats
    • Email addresses, passwords, and URLs
    • Complex addresses and location references
  • Industry-leading voice clarity rated higher than competitors in customer service scenarios

Availability

  • Aura-2 is available now via REST and Websocket APIs
  • Currently available for use through our hosted offering

For detailed information about Aura-2, please refer to our Developer Documentation.