In this guide, you will learn how to obtain, configure, and run isolated instances of Deepgram products.
To learn about running a coordinated set of Deepgram services using Docker Compose or Docker Swarm, see Deployment.
Deepgram makes all of its products available through Docker Hub. To download the latest products:
Log in to your Docker Hub account from one of your servers.
Run the following command:
bash$ docker pull [PRODUCT-IMAGE]
Choose the appropriate Deepgram product to learn how to configure it.
To configure Speech Engine, you will need:
toml# Keep in mind that all paths are in-container paths and do not need to exist # on the host machine. # Configure license validation. [license] # Your license key key = "36c0e479-2ab9-471e-a217-5dd809a236bc" # Enable ancillary models # To disable any of these features, just remove to comment out the respective # feature section. [features] [[features.punctuator]] weights = "/engine/punctuator.dg" # specify the version as "beta" if using a v2 punctuator; otherwise no need to specify version version = "beta" [[features.diarizer]] weights = "/engine/diarizer.dg" [features.g2p] path = "/engine/g2p.dg" # Configure the server to listen for requests from the API. [server] # The base URL (prefix) for requests from the API. base_url = "/v2" # The IP address to listen on. Since this is likely running in a Docker # container, you will probably want to listen on all interfaces. host = "0.0.0.0" # The port to listen on port = 8080 # Speech models. Each model will have its own section. You can specify # multiple models. [[products]] # The name of the model (if no model is specified, the API will try to find # the "general" model). name = "general" # Version string. generation = "alpha" version = "v1" # Path to the weights on disk. path = "/engine/general.dg"
Assuming your model files are in a folder named
/path/to/engine and your configuration file is at
/path/to/config.toml, launch Speech Engine by running:
bash$ docker run \ -d \ --runtime=nvidia \ -v "/path/to/engine:/engine:ro" \ -v "/path/to/config.toml:/config.toml:ro" \ -p 127.0.0.1:8080:8080 \ deepgram/onprem-engine:latest \ -v serve /config.toml
To configure Speech API, you will need a configuration file, which is written in TOML to promote easy human editing.
toml# Keep in mind that all paths are in-container paths and do not need to exist # on the host machine. # Configure license validation. [license] # Your license key. key = "36c0e479-2ab9-471e-a217-5dd809a236bc" # Configure how the API will listen for your requests [server] # The base URL (prefix) for requests to the API. base_url = "/v2" # The IP address to listen on. Since this is likely running in a Docker # container, you will probably want to listen on all interfaces. host = "0.0.0.0" # The port to listen on port = 8080 # How long to wait for a connection to a callback URL (in seconds) callback_conn_timeout = 1 # How long to wait for a response to a callback URL (in seconds) callback_timeout = 10 # How long to wait for a connection to a fetch URL (in seconds) fetch_conn_timeout = 1 # How long to wait for a response to a fetch URL (in seconds) fetch_timeout = 60 # Configure the DNS resolver, overriding the system default. # Typically not needed, although we document it here for completeness. # [resolver] # # List of nameservers to use to resolver DNS queries. # nameservers = ["127.0.0.11 53 udp"] # # Override the TTL in the DNS response (in seconds). # max_ttl = 10 # Configure the backend pool of speech engines (generically referred to as # "drivers" here). There are two pools: "standard" and "failover". The API will # load-balance among drivers in the standard pool; if a standard driver fails, # the next one will be tried. If all drivers in the standard pool fail, then # the API will load-balance among drivers in the failover pool; if a failover # driver fails, the next one will be tried. # # Each driver URL will have its hostname resolved to an IP address. If a domain # name resolves to multiple IP addresses, the API will load-balance across each # IP address. # # This behavior is provided for convenience, and in a production environment # other tools can be used, such as HAProxy. # A new Speech Engine ("driver") in the "standard" pool. [[driver_pool.standard]] # Host to connect to. Here, we use "tasks.engine", which is the Docker Swarm # method for resolving the IP addresses of all "engine" services. If you are # using Docker Compose, then this should just be "engine" instead of # "tasks.engine". If you rename the "engine" service in the Docker Compose # file, then change it accordingly here. Additionally, the port and prefix # should match those defined in the Engine configuration file. # NOTE: This must be HTTPS. url = "https://tasks.engine:8080/v2" # How long to wait for a connection to be established (in seconds). conn_timeout = 5 # Once a connection is established, how many seconds to wait for a response. timeout = 400 # Factor to increase the timeout by for each additional retry (for # exponential backoff). timeout_backoff = 1.2 # If you fail to get a valid response (timeout or unexpected error), then # how many attempts should be made in total, including the initial attempt? # This is applied *per IP address* that the domain name in the URL resolves # to. If your domain resolves to multiple IPs, then "1" may be sufficient. retries_per_ip = 1 # Before attempting a retry, sleep for this long (in seconds) retry_sleep = 2 # Factor to increase the retry sleep by for each additional retry (for # exponential backoff). retry_backoff = 1.6 # Maximum response to deserialize from Driver (in bytes) max_response_size = 1073741824 # Additional speech engines ("drivers") can be defined here, either in the # standard pool using [[driver_pool.standard]], or in the failover pool by # using [[driver_pool.failover]].
You can configure the metrics server using a configuration file, which is written in TOML to promote easy human editing, or via environment variables.
The metrics server accepts the path to the configuration file as a command line argument:
bash$ docker run \ -d \ -v /path/to/config.toml:/config.toml:ro \ -p 8000:8000 \ deepgram/metrics-server:latest /config.toml
toml# config.toml server_address = "0.0.0.0:8000"
When running a metrics server, you must configure the speech engine and speech API to send metrics to it. To do so, add the following to the configuration files for both the speech engine and speech API:
toml# speech-engine/config.toml # -- AND -- # speech-api/config.toml [metrics] url = "http://tasks.metrics-server:8000"
Hotpepper requires paths to four resources:
toml# The container path to the Hotpepper database. (Note: Naming conventions # may mention Dashscript, a previous version of Hotpepper; they are the same tool.) db = "/db/dashscript.db" # A directory for storing input datasets (collections of audio files to # transcribe). # Path to the directory containing input datasets. New datasets are # created by adding subdirectories to this folder and placing audio data # there. This directory should be structured like so: # /datasets/ # |_ dataset1/ # |_ audio1.mp3 # |_ audio2.mp3 # ... datasets = "/datasets" # A directory for outputting packaged datasets. packaged_dataset_location = "/packaged" [server] port = 80 # For customers who have ASR enabled, the following configuration will place # a button "Get ASR" on the L1 transcription page to pre-populate the # transcript field with ASR. # Ensure that the endpoint points to your on-premise API instance! [asr] endpoint = "http://tasks.api:8080/v2/listen?punctuate=true" # Eg. docker swarm # endpoint = "http://api:8080/v2/listen?punctuate=true" # Eg. docker compose
Hotpepper can be configured to allow users labeling at level L1 to submit assigned files to an on-premise Deepgram Speech Engine for automatic speech recognition (ASR) and transcription. When ASR is used, the Hotpepper server sends the assigned audio file to the configured Speech API endpoint, parses a transcript from the results, and automatically populates the Transcript textarea of the labeling view with the returned transcript. In our experience, users value this feature highly when labeling.
config.toml file, notice the following lines, which enable ASR:
toml# Ensure that the endpoint points to your Speech API instance [asr] # docker swarm endpoint = "http://tasks.api:8080/v2/listen?punctuate=true" # docker compose # endpoint = "http://api:8080/v2/listen?punctuate=true"
When Hotpepper is properly configured for ASR, the File Details area of the Hotpepper labeling view will include a Get ASR button. To learn more, see Hotpepper User Guide: Labeling Data.