Deepgram is proud to announce the release of Nova-3, our most advanced speech-to-text model to date. Key improvements include:
Performance Improvements
- 54.3% reduction in word error rate (WER) for streaming audio compared to competitors (6.84% median WER)
- 47.4% reduction in WER for batch processing (5.26% median WER)
- Maintains industry-leading inference speed, with latency comparable to Nova-2
New Features
-
Self-serve customization through Keyterm Prompting
- Instantly adapt up to 100 domain-specific terms without model retraining
- Improved recognition of specialized vocabulary and technical terminology
-
Enhanced capabilities for challenging audio conditions:
- Improved handling of background noise and overlapping speech
- Better numeric recognition
- Real-time redaction for up to 50 entities
- Greater word-level timestamp precision
- Improved English formatting and paragraph structuring
Availability
Nova-3 English is now available through our API. To access:
- Use
model=nova-3
in your API calls - Available for hosted use
- Supports both pre-recorded and real-time streaming transcription
- Multilingual and self-hosted deployments will be available in subsequent releases
For detailed information about Nova-3, please refer to our Developer Documentation.