On-Premises Release 221007
Deepgram released a new version of its on-premises solution.
On-Premises Release 221007: Docker Hub Images
-
deepgram/onprem-api:1.70.0 -
deepgram/onprem-engine:3.37.2- Minimum required NVIDIA driver version:
>=450.80.02
- Minimum required NVIDIA driver version:
-
deepgram/onprem-license-proxy:1.2.1 -
deepgram/onprem-billing:1.4.0 -
deepgram/onprem-metrics-server:2.0.0
Changes
-
Deepgram On-premises deployments now support the following Understanding features (with the accompanying Understanding model deployed on-prem and the requisite configuration changes):
-
Summarization enables users to generate meaningful summaries from their audio data automatically. It provides a segment-level summary breakdown with start- and end-character positions, which customers can use to identify the start and end timestamps for each summarized section.Ā
summarize=true&punctuate=true- This requires the addition of the following section to theĀ
api.tomlĀ file:[features] summarization = true
- This requires the addition of the following section to theĀ
-
Language Detection enables users to identify the dominant language of an audio file and transcribe the output in the detected language. It does this by taking an initial sampling of the audio file.Ā
detect_language=true&punctuate=true- This requires the addition of the following section to theĀ
engine.tomlĀ file:[features] language_detection = true
- This requires the addition of the following section to theĀ
-
When you use these Understanding features, please note that theĀ punctuate=trueĀ parameter is required as part of the ASR request. If you do not explicitly include this parameter, it will be implicitly included by the system.
-
-
Deepgram On-premises deployments now support Deepgram CloudāsĀ /v1Ā endpoint schema.
- New on-prem configurations will default to using theĀ /v1Ā endpoint schema.
- Legacy on-prem configurations may continue to use theĀ /v2Ā endpoint schema although it is now deprecated.
-
On startup, Engine will automatically enableĀ half-precision floating-point formatĀ if it is supported by the NVIDIA GPU.
- If you encounter issues with the ASR output, we recommend turning this feature off as a troubleshooting step. This can be done by adding theĀ stateĀ parameter to theĀ
engine.tomlĀ file:[half_precision] state = "disabled" # or "enabled" or "auto" (the default)
- If you encounter issues with the ASR output, we recommend turning this feature off as a troubleshooting step. This can be done by adding theĀ stateĀ parameter to theĀ
-
Engine will now return an explicit error message indicating if theĀ model_managerĀ search path is misconfigured.
Error: failure: Configuration contains inaccessible model search path(s) -
Time duration values can now be specified in configuration files using a human-readable format such as ā1hā to represent 1 hour, ā2mā to represent 2 minutes, ā60sā to represent 60 seconds, etc.
-
The streaming connection timeout between API and Engine is now configurable via theĀ streaming_timeoutĀ parameter in theĀ
api.tomlĀ file.[[driver_pool.standard]] ... streaming_conn_timeout = "60s"
Fixes
- Resolves an issue where WebSocket callbacks were improperly shutdown, which prevented the WebSocketĀ CloseĀ frame from being issued in compliance withĀ RFC 6455Ā and may have resulted in partial transcription data loss in the WebSocket callback.