Deepgram is proud to announce the release of Nova-3, our most advanced speech-to-text model to date. Key improvements include:

Performance Improvements

  • 54.3% reduction in word error rate (WER) for streaming audio compared to competitors (6.84% median WER)
  • 47.4% reduction in WER for batch processing (5.26% median WER)
  • Maintains industry-leading inference speed, with latency comparable to Nova-2

New Features

  • Self-serve customization through Keyterm Prompting

    • Instantly adapt up to 100 domain-specific terms without model retraining
    • Improved recognition of specialized vocabulary and technical terminology
  • Enhanced capabilities for challenging audio conditions:

    • Improved handling of background noise and overlapping speech
    • Better numeric recognition
    • Real-time redaction for up to 50 entities
    • Greater word-level timestamp precision
    • Improved English formatting and paragraph structuring

Availability

Nova-3 English is now available through our API. To access:

  • Use model=nova-3 in your API calls
  • Available for hosted use
  • Supports both pre-recorded and real-time streaming transcription
  • Multilingual and self-hosted deployments will be available in subsequent releases

For detailed information about Nova-3, please refer to our Developer Documentation.