Deploy Deepgram Products

Last updated 06/18/2021

Configure Deployment

Deployment of on-premise Deepgram products is typically done using a Docker service orchestration technology (Docker Compose or Docker Swarm).

Docker Swarm

To deploy the Speech API and Speech Engine services using Docker Swarm, you can use the following sample Compose file.

Remove services that you will not be using or for which you do not have a license. Otherwise, they will fail to start.
version: "3.7"

services:

  # The speech API service.
  api:
    image: deepgram/onprem-api:latest

    deploy:
      mode: replicated
      replicas: 1

    # Here we expose the API port to the host machine (or, in the case of
    # Docker Swarm, this will expose it to all nodes in the swarm via the
    # ingress overlay network). The container port (right-hand side) must match
    # the port that the API service is listening on (from its configuration
    # file).
    ports:
      - "8080:8080"

    volumes:
      # The API configuration file needs to be accessible. For Docker Swarm,
      # you will need this path to be accessible to all nodes in the swarm; we
      # recommend using a distributed filesystem in this case.
      - "/path/on/host/api.toml:/api.toml:ro"

    # Invoke the API server, passing it the path (inside the container) to its
    # configuration file.
    command: -v serve /api.toml

  # The speech engine service.
  engine:
    image: deepgram/onprem-engine:latest

    deploy:
      mode: replicated

      # Feel free to increase the number of replicas to match your GPU
      # resources.
      replicas: 1

      # Assuming you have correctly configured Docker and nvidia-docker to
      # advertise to allocate GPU resources, you can request that each Engine
      # instance use a specific number of GPUs (here, 2).
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: gpu
                value: 2

    # These configuration files and models need to be accessible. For Docker
    # Swarm, you will need these path to be accessible to all nodes in the
    # swarm; we recommend using a distributed filesystem in this case.
    #
    # There is no reason the configuration file and models can't be combined
    # into a single volume mount.
    volumes:
      # The Engine configuration file needs to be accessible. 
      - "/path/on/host/engine.toml:/engine.toml:ro"

      # The models will also need to be available. The in-container paths must
      # match the paths specified in the Engine configuration file.
      - "/path/on/host/models:/engine:ro"

    # Invoke the Engine service, passing it the path (inside the container) to
    # its configuration file.
    command: -v serve /engine.toml

  # The metrics service.
  metrics-server:
    image: deepgram/metrics-server:latest

    deploy:
      mode: replicated
      replicas: 1

    volumes:
      # The configuration file needs to be accessible.
      - "/path/on/host/metrics-server.toml:/metrics-server.toml:ro"

    # Invoke the metrics server, passing it the path (inside the container) to
    # its configuration file.
    command: /metrics-server.toml

  # Hotpepper, the on-premise human transcription/labeling tool for creating training
  # sets. (Note: Naming conventions may mention Dashscript, a previous version 
  # of Hotpepper; they are the same tool.)
  dashscript:
    image: deepgram/onprem-dashscript:latest

    deploy:
      mode: replicated
      replicas: 1

    # Here we expose the service port to the host machine. The container port
    # (right-hand side) must match the port that the service is listening
    # on (from its configuration file).
    ports:
      - "8081:80"

    # An admin "super-user" must be present in the database to execute user and dataset
    # management actions through the app.  Setting these environment variables will
    # ensure that the app starts with at least one.
    environment:
      DASHSCRIPT_ADMIN_USER: "admin"
      DASHSCRIPT_ADMIN_PASSWORD: "admin_pass"

    volumes:
      # The container paths (right side) should align with those used in the Hotpepper
      # config file.
      
      # Path to the Hotpepper database directory.  The name of the database will be
      # pulled from the Hotpepper config file.
      - "/path/to/database:/db"

      # Path to the directory containing input datasets. New datasets are
      # created by adding subdirectories to this folder and placing audio data
      # there.  This directory should be structured like so:
      # /path/to/input/data/
      #   |_ dataset1/
      #      |_ audio1.mp3
      #      |_ audio2.mp3
      #   ...
      - "/path/to/input/data:/datasets"

      # Path to where Hotpepper
      # will save its finalized, packaged datasets.
      - "/path/to/output/data:/packaged"

      # Path to the Hotpepper config file.
      - "/path/to/config/file.toml:/config.toml:ro"

    # Invoke Hotpepper, giving it the path to the config file (in the
    # container).
    command: /config.toml serve

Docker Compose

To deploy the Speech API and Speech Engine services using Docker Compose, you can use the following sample Compose file.

Remove services that you will not be using or for which you do not have a license. Otherwise, they will fail to start.
version: "2.4"

services:

  # The speech API service.
  api:
    image: deepgram/onprem-api:latest

    # Here we expose the API port to the host machine. The container port
    # (right-hand side) must match the port that the API service is listening
    # on (from its configuration file).
    ports:
      - "8080:8080"

    volumes:
      # The API configuration file needs to be accessible; this should point to
      # the file on the host machine.
      - "/path/on/host/api.toml:/api.toml:ro"

    # Invoke the API server, passing it the path (inside the container) to its
    # configuration file.
    command: -v serve /api.toml

  # The speech engine service.
  engine:
    image: deepgram/onprem-engine:latest

    # Change the default runtime.
    runtime: nvidia

    # These configuration files and models need to be accessible; these paths
    # should point to files/directories on the host machine.
    #
    # There is no reason the configuration file and models can't be combined
    # into a single volume mount.
    volumes:
      # The Engine configuration file needs to be accessible. 
      - "/path/on/host/engine.toml:/engine.toml:ro"

      # The models will also need to be available. The in-container paths must
      # match the paths specified in the Engine configuration file.
      - "/path/on/host/models:/engine:ro"

    # Invoke the Engine service, passing it the path (inside the container) to
    # its configuration file.
    command: -v serve /engine.toml

  # The metrics service.
  metrics-server:
    image: deepgram/metrics-server:latest

    volumes:
      # The configuration file needs to be accessible.
      - "/path/on/host/metrics-server.toml:/metrics-server.toml:ro"

    # When run with no command line arguments, the metrics server may
    # be configured using environment variables.
    environment:
      SERVER_ADDRESS: "0.0.0.0:8000"

  # Hotpepper, the on-premise human transcription/labeling tool for creating training
  # sets. (Note: Naming conventions may mention Dashscript, a previous version 
  # of Hotpepper; they are the same tool.)
  dashscript:
    image: deepgram/onprem-dashscript:latest

    # Here we expose the service port to the host machine. The container port
    # (right-hand side) must match the port that the service is listening
    # on (from its configuration file).
    ports:
      - "8081:80"

    # An admin "super-user" must be present in the database to execute user and dataset
    # management actions through the app.  Setting these environment variables will
    # ensure that the app starts with at least one.
    environment:
      DASHSCRIPT_ADMIN_USER: "admin"
      DASHSCRIPT_ADMIN_PASSWORD: "admin_pass"

    volumes:
      # The container paths (right side) should align with those used in the Hotpepper
      # config file.
      
      # Path to the Hotpepper database directory.  The name of the database will be
      # pulled from the Hotpepper config file.
      - "/path/to/database:/db"

      # Path to the directory containing input datasets. New datasets are
      # created by adding subdirectories to this folder and placing audio data
      # there.  This directory should be structured like so:
      # /path/to/input/data/
      #   |_ dataset1/
      #      |_ audio1.mp3
      #      |_ audio2.mp3
      #   ...
      - "/path/to/input/data:/datasets"

      # Path to where Hotpepper
      # will save its finalized, packaged datasets.
      - "/path/to/output/data:/packaged"

      # Path to the Hotpepper config file.
      - "/path/to/config/file.toml:/config.toml:ro"

    # Invoke Hotpepper, giving it the path to the config file (in the
    # container).
    command: /config.toml serve
The fundamental difference between these two files in the version of the Compose file: Docker Swarm requires v3, but the `runtime` key (used by Docker Compose) is only available in v2. If you are a Docker Compose user who wants to use v3 Compose files, you may do so by changing the default runtime on the Docker daemon.

Start the Deployment

Before starting the deployment, make sure:

  • you have configured Docker as appropriate for your environment (Compose or Swarm), including CUDA.
  • you are logged into Docker Hub.

Assuming the Compose file (provided in the previous section) is saved to /path/to/compose.yml, run:

$ docker-compose -f /path/to/compose.yml pull
$ docker-compose -f /path/to/compose.yml -p deepgram up -d 
Make sure you are performing all steps from a master node.

Assuming the Compose file (provided in the previous section) is saved to /path/to/compose.yml, run:

$ docker stack deploy -c /path/to/compose.yml --with-registry-auth deepgram

Monitor the Deployment

Although generally you should refer to Docker's documentation to learn how to query the state of the Docker ecosystem, some useful commands are as follows:

$ docker-compose ps
$ docker ps
$ docker stats
$ docker stack ps deepgram
$ docker service ps deepgram_api
$ docker service ps deepgram_engine

Test the Deployment

To ensure that everything is working, you can send end-to-end audio data through the pipeline. In this example, we assume that your API endpoint is exposed at localhost on port 8080; please change the set according to your network topography.

If you have local audio data available, run:

$ curl -X POST -T /path/to/audio.wav -H "Expect:" http://localhost:8080/v2/listen

If you permit outgoing connections to the public internet, you can also test on remotely-hosted media:

$ curl -H 'Content-type: application/json' -X POST -d '{"url": "https://deepgram.com/examples/interview_speech-analytics.wav"}' -H "Expect:" http://localhost:8080/v2/listen

If you are receiving transcripts, congratulations! If not, add a verbose flag (-v) to each curl command, and see if the status code and response body can help you identify the error.

If you are still having problems, here are some common tests:

TestMethod
Are the Docker containers running?docker ps
docker-compose ps
docker service ps ...
Are there obvious problems in the logs?docker logs
docker-compose logs
docker service logs ...
Can you reach the API port?nc HOST PORT
telnet HOST PORT
netstat -tunap)
Is the Docker network functioning correctly, including domain name resolution?Most easily tested by attaching a test container to the appropriate Docker network and querying DNS.

First, determine which Docker network is appropriate (docker network ls) and ensure it is attachable. If it is not, consult the Docker documentation for details on how to configure your Docker networks. Spawn an interactive container (docker run --rm -it --network=NETWORK alpine). Install a DNS client (apk add drill) and use it to query DNS (drill HOST). Ensure that the domain name server you are querying (see the drill documentation for details) matches the resolver (if any) in your API configuration.
If you publish the speech engine's port, can you connect to it?