Container Images (release 250626)
-
quay.io/deepgram/self-hosted-api:release-250626
- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.150.2
quay.io/deepgram/onprem-api:release-250626
quay.io/deepgram/onprem-api:1.150.2
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-250626
-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.91.0
quay.io/deepgram/onprem-engine:release-250626
quay.io/deepgram/onprem-engine:3.91.0
-
Minimum required NVIDIA driver version:
>=550.163.01
-
-
quay.io/deepgram/self-hosted-license-proxy:release-250626
- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.8.0
quay.io/deepgram/onprem-license-proxy:release-250626
quay.io/deepgram/onprem-license-proxy:1.8.0
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-250626
- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.11.2
quay.io/deepgram/onprem-billing:release-250626
quay.io/deepgram/onprem-billing:1.11.2
- Equivalent image to:
This Release Contains The Following Changes
- Improves smart formatting for emails, alphanumerics, quantities, and percentages.
- Expands language support for profanity filtering in German, Swiss German, Polish, Portuguese, Spanish, and Swedish.
- Resolves an edge case when handling certain corrupt audio and now returns an HTTP 400 error code.
- Keeps our software up-to-date.
Reminder: The Deepgram image repositories have been updated to reflect our “self-hosted” naming. Images should now be pulled from the self-hosted-*
Quay repositories. For the next three months, both onprem-*
and self-hosted-*
image repositories will receive identical image updates monthly, and we will announce image tags in the self-hosted
repositories. Subsequently, we will only publish new images to self-hosted-*
repos, deprecating onprem-*
repository variants.
Deepgram is excited to announce Aura‑2 (text to speech) Spanish voices, empowering real-time voice applications with a new, high-fidelity Spanish option for enterprise use.
Voice Quality & Features
- Launching with 10 distinct Spanish voices, each tuned for specific business contexts.
- Spanish voices are optimized for pacing, intonation, and emphasis suitable for professional interactions—from customer support to healthcare use cases.
- Superior pronunciation accuracy for domain-specific content:
- Currency (e.g., ”€”, “pesos”)
- Dates/timestamps in various formats
- Acronyms and alphanumeric IDs
- Email addresses, passwords and URLs
- Spanish-language proper nouns
Availability
- Aura-2 Spanish (es) is available now via REST and Websocket APIs
- Current available for use through our hosted offering with self-hosted support coming soon
For detailed information, please refer to our Developer Documentation
Profanity Filtering Gets Expanded Language Support
Profanity filtering now supports 6 additional languages beyond English, giving you content moderation capabilities across your global user base. Available on monolingual models for:
Newly Supported:
- German (
de
) - Swiss German (
de-CH
) - Polish (
pl
) - Portuguese (
pt
,pt-BR
,pt-PT
) - Spanish (
es
,es-419
) - Swedish (
sv
,sv-SE
)
Existing English Support:
en
,en-US
,en-AU
,en-GB
,en-NZ
,en-IN
This expansion lets you deploy consistent content policies across international markets without building custom filtering logic.
Smart Formatting Improvements
We’ve resolved several high-impact formatting edge cases that were causing transcription accuracy issues in production environments:
Improved Entity Formatting via Smart Format
Email Transcription Improvements
- Fixed:
'o'
characters in email addresses now transcribe correctly instead of converting to'0'
- Fixed: edge case email mentions that were being dropped entirely in specific batch processing scenarios
Certain formerly numeric-only sequences have been updated to correctly preserve all alphanumeric characters:
- Before (some entities):
"my account number is a b c d zero nine"
→"my account number is 09"
- After (some entities):
"my account number is a b c d zero nine"
→"my account number is ABCD09"
Quantity modifiers (‘single’, ‘double’, ‘triple’ + standalone character or number) are better handled via Smart Format:
- Before (some entities):
"double 2"
→"2"
- After (some entities):
"double 2"
→"22"
Special cases of ‘hundred’ or ‘a hundred’ now supported via Smart Format:
- Before (some entities):
"hundred percent"
→"%"
- After (some entities):
"hundred percent"
→"100%"
This update has gone out to all hosted streaming transcription, and will be applied to our next self-hosted release later this month.
Container Images (release 250610)
-
quay.io/deepgram/self-hosted-api:release-250610
- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.146.1
quay.io/deepgram/onprem-api:release-250610
quay.io/deepgram/onprem-api:1.146.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-250610
-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.89.2
quay.io/deepgram/onprem-engine:release-250610
quay.io/deepgram/onprem-engine:3.89.2
-
Minimum required NVIDIA driver version:
>=550.163.01
-
-
quay.io/deepgram/self-hosted-license-proxy:release-250610
- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.8.0
quay.io/deepgram/onprem-license-proxy:release-250610
quay.io/deepgram/onprem-license-proxy:1.8.0
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-250610
- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.11.2
quay.io/deepgram/onprem-billing:release-250610
quay.io/deepgram/onprem-billing:1.11.2
- Equivalent image to:
This Release Contains The Following Changes
- Adds full support for Voice Agent v1 API.
- Addresses an issue with the
engine_active_requests
metric for streaming STT auto-scaling. - Resolves an issue with year formatting for
smart_format.
- Keeps our software up-to-date.
Reminder: The Deepgram image repositories have been updated to reflect our “self-hosted” naming. Images should now be pulled from the self-hosted-*
Quay repositories. For the next four months, both onprem-*
and self-hosted-*
image repositories will receive identical image updates monthly, and we will announce image tags in the self-hosted
repositories. Subsequently, we will only publish new images to self-hosted-*
repos, deprecating onprem-*
repository variants.
We’ve just released a significant upgrade to Nova-3 Medical Streaming, bringing substantial improvements in accuracy for real-time medical transcription use cases. This update focuses specifically on our streaming model, delivering better performance across key transcription metrics.
Performance Improvements
- **11% relative reduction in Overall WER **compared to Nova-3 general streaming model
- **30% relative reduction in Overall WER **compared to Nova-2 Medical streaming model
- **2.7x improvement in Keyword Recall Rate (KRR) **compared to Nova-3 general streaming model
- Maintains industry-leading inference speed with ultra-low latency for real-time healthcare applications
Availability
The updated Nova-3 Medical streaming model is now available through our API. To access:
- Use
model=nova-3-medical
in your streaming API calls - Available for hosted use
- Self-hosted deployments will be available in subsequent releases
- English only
For details on the original Nova-3 Medical release (including batch capabilities), check out the original changelog: Introducing Nova-3 Medical. For detailed information about Nova-3 Medical, please refer to our Developer Documentation.
Deepgram is proud to announce the release of Aura-2, our text-to-speech model purpose-built for realtime enterprise use cases.
Performance
- Sub-200ms time-to-first-byte (TTFB) latency for real-time conversational interactions
- 0.111x Real-Time Factor (RTF), synthesizing one second of audio in just over 100 milliseconds
Voice Quality & Features
-
Enterprise-optimized voice catalog with 40+ distinct voices, each designed for specific business contexts
-
Tuned for professional and transactional interactions with appropriate tone, pacing, and emphasis
-
Superior pronunciation accuracy for domain-specific content:
- Currency and numerals
- Dates and timestamps in varied formats
- Email addresses, passwords, and URLs
- Complex addresses and location references
-
Industry-leading voice clarity rated higher than competitors in customer service scenarios
Availability
- Aura-2 is available now via REST and Websocket APIs
- Currently available for use through our hosted offering
For detailed information about Aura-2, please refer to our Developer Documentation.
Container Images (release 250505)
-
quay.io/deepgram/self-hosted-api:release-250505
- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.142.1
quay.io/deepgram/onprem-api:release-250505
quay.io/deepgram/onprem-api:1.142.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-250505
-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.89.0
quay.io/deepgram/onprem-engine:release-250505
quay.io/deepgram/onprem-engine:3.89.0
-
Minimum required NVIDIA driver version:
>=550.163.01
-
-
quay.io/deepgram/self-hosted-license-proxy:release-250505
- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.8.0
quay.io/deepgram/onprem-license-proxy:release-250505
quay.io/deepgram/onprem-license-proxy:1.8.0
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-250505
- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.11.2
quay.io/deepgram/onprem-billing:release-250505
quay.io/deepgram/onprem-billing:1.11.2
- Equivalent image to:
This Release Contains The Following Changes
- Extends numeral formatting for supported languages when using
detect_language=true
. - Improves formatting of dates.
- Resolves an issue with Whisper functionality.
- Keeps our software up-to-date.
Reminder: The Deepgram image repositories have been updated to reflect our “self-hosted” naming. Images should now be pulled from the self-hosted-*
Quay repositories. For the next five months, both onprem-*
and self-hosted-*
image repositories will receive identical image updates monthly, and we will announce image tags in the self-hosted
repositories. Subsequently, we will only publish new images to self-hosted-*
repos, deprecating onprem-*
repository variants.
We’ve made some improvements to Smart Formatting, enhancing both streaming finalization behavior and entity recognition performance.
Streaming Finalization Improvements
Previously, when formatting entities like dates or credit card numbers, our models would sometimes wait for additional words before finalizing the transcript—particularly if the entity seemed incomplete. For example, when someone said “nineteen seventy…” Deepgram might pause, expecting a possible follow-up like “nine” or other additional speech before finalizing the complete year.
Now, instead of potentially waiting indefinitely for additional words, our system will finalize the transcript after 3 seconds of silence, and attempt to format the entity based on the available audio. This helps ensure transcripts are returned faster and more reliably—without sacrificing too much formatting precision.
Want more control over the finalization behavior? You have two options:
- Implement logic to send a
Finalize
message earlier than the 3-second threshold. Reference our Finalize documentation here. - Set
no_delay=true
to override formatting and force immediate finalization. NOTE: This will result in skipping formatting altogether in many cases.
Enhanced Entity Formatting
In addition to the timeout improvements, we’ve refined our Named Entity Recognition model for Smart Formatting to better identify and format:
- Date variations
- Alphanumerics (order numbers, membership IDs, prescription numbers, etc.)
- Currencies
- Payment and card information
- SSNs
- Time zones
This update is automatically applied to all streaming transcription using Smart Formatting, and is included in our Self-Hosted March 2025 Release (250331). For more details, check out our Smart Formatting documentation.
Deepgram is proud to announce the general availability of Nova-3 Multilingual, the first model of its kind able to codeswitch in real-time across 10 different languages. This revolutionary capability unlocks a host of new possibilities for global operations by processing multilingual conversations instantly with a single model—an industry-first breakthrough that changes the game for speech recognition.
Multilingual Support
-
Real-time multilingual speech recognition with a truly unified speech recognition system
-
Supports code-switching between 10 languages:
- English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch
-
Seamlessly handles natural language transitions without relying on explicit routing or language-specific mechanisms
-
Maintains high transcription accuracy across languages while adapting to natural language transitions
-
Developed through a multi-stage training process combining synthetic code-switched data at massive scale with carefully curated real-world datasets
Use Cases
Nova-3 Multilingual represents a significant breakthrough for applications in:
- Global customer support
- Emergency response (e.g., 911 calls)
- Multilingual meetings
- Retail interactions
- Healthcare settings
In high-stakes scenarios like emergency response, Nova-3 can fluidly process interactions where callers switch between languages (e.g., Spanish and English) in real time, ensuring dispatchers receive accurate, immediate transcriptions without missing critical details.
Availability
- Now available through our API
- Use
model=nova-3&language=multi
in your API calls - Supports both pre-recorded and real-time streaming transcription
- Available for hosted and self-hosted use
For detailed information about Nova-3 Multilingual, please refer to our Developer Documentation.
Deepgram is excited to announce expanded language support for numerals through our numerals
and smart_format
parameters, providing more comprehensive coverage for converting written numbers to numerical format across additional languages.
Expanded Language Support
- New languages added to Numerals support:
- Danish:
da
,da-DK
- Dutch:
nl
- French:
fr
,fr-CA
- German:
de
- German (Switzerland):
de-CH
- Italian:
it
- Norwegian:
no
- Polish:
pl
- Portuguese:
pt
,pt-BR
,pt-PT
- Spanish:
es
,es-419
- Swedish:
sv
,sv-SE
- Danish:
Previously supported language:
- English:
en
,en-US
,en-AU
,en-GB
,en-NZ
,en-IN
Feature Integration
- All newly supported languages are fully integrated with our Smart Format feature
- When using
smart_format=true
, numerals will automatically be applied for all supported languages - Individual control remains available through the dedicated
numerals=true
parameter
Availability
The expanded Numerals support is now available through our API for use with all Deepgram speech-to-text models.
- Available for hosted and self-hosted usage.
- Compatible with both pre-recorded and real-time streaming transcription
For detailed information about our expanded Numerals or Smart Formatting support, please refer to our Developer Documentation.