Configure Hardware & Software

Last updated 06/18/2021

The following are sample recommended hardware layouts. These different configurations can overlap on a physical machine: a system prepared for training can also be used for inference and labeling.

Deepgram only supports NVIDIA GPUs.
PurposeSample Recommended Hardware Layout
Inference
(performing speech analysis on trained speech models)
  • 1 K80 or better NVIDIA GPU with at least 8GB GPU RAM
  • 4 CPU cores
  • 32 GB RAM
  • 32 GB storage


A system of this sort will typically provide a 50-100x real-time speedup. For higher throughput, please consult a Deepgram sales representative for a customized hardware recommendation.

An Amazon EC2 p2.xlarge instance works well for a baseline deployment. A p3.2xlarge is a very cost-effective way to achieve substantially higher throughput.
Training
(creating new custom models)
  • 1 K80 or better NVIDIA GPU per training process minimum with at least 8GB GPU RAM (16 GB GPU recommended)
  • 32 CPU cores
  • 32 GB RAM minimum (64+ GB recommended, however actual memory required will depend on total training set size.)
  • 32 GB storage minimum (128+ GB preferred, more may be required as multiple models are trained)


The recommended GPU is at least one V100 (the p3.2xlarge instance on Amazon EC2).

Sizes do not include training data storage requirements. To train a model, approximately 3x the training data size is required.
Labeling
(creating new training datasets)
  • 2 CPU cores
  • 4 GPU RAM
  • 32 GB storage


Typically, an Amazon EC2 instance like the a1.large orfor small transcription workloadsan a1.medium will suffice.

Sizes do not include training data storage requirements.

Deepgram's software is intended to be run using Docker, a containerization technology used to isolate, secure, and deploy applications across many different host environments and operating systems. Containers are more light-weight than virtual machines, with many of the same isolation benefits.

Although Docker can be run from many different host operating systems, we recommend using Ubuntu 18.04 LTS or a similar Linux distribution, as we have tested our products most extensively in these OSes.

For the best scaling experience, or for high availability deployments, we also recommend a multi-host orchestration solution, such as Docker Swarm (recommended) or Kubernetes.

Docker

Ensure that Docker Engine is installed on all hosts, version 18.06 or later. Make sure the user you use is in thedocker group, so that it has sufficient permissions to communicate with the Docker Daemon (system service).

To test that Docker is installed properly, run:

$ docker --version

If you will be using Docker Swarm, follow Docker's official guide to ensure that swarm mode is enabled on all hosts, that an appropriate number of master nodes have been configured, and that there are no networking issues.

To test your Swarm configuration, run the following from a master node:

$ docker node ls

If you will not be using Docker Swarm, ensure that Docker Compose is installed.

To test the installation, run:

$ docker-compose version
Once you are satisfied that Docker is installed and configured correctly, cache your credentials locally by logging in to Docker Hub using the Docker credentials you created earlier:
$ docker login

This will make it will make it so that these commands should not need to be executed multiple times unless you log out (docker logout).

Please be aware of any security concerns surrounding caching credentials.

Firewall

Docker typically takes care of networking concerns, so as long as your firewall is configured to allow Docker (and Docker Swarm, where appropriate), you should have no special concerns.

Deepgram server containers typically listen on port 8080 inside the container.

If you use online licensing (the most common form of licensing for on-premise products), you'll need to permit outbound HTTPS network traffic to license.deepgram.com on port 443.

If you use Docker Hub (recommended), you'll need to allow outbound traffic to Docker's servers on port 443.

CUDA

Only applicable if you will be using GPU acceleration.

Support for CUDA 11 is not yet available, but coming soon.

CUDA is NVIDIA's library for interacting with its GPU. Host machines will need to have the latest NVIDIA drivers installed. Drivers are available on NVIDIA's Driver Download site.

To test that the drivers are properly installed, run:

$ nvidia-smi
For automated/headless installations (very common), you can retrieve the driver download URL from the NVIDIA driver download page:
  1. Select the correct product and a corresponding CUDA toolkit greater than 9.2.
  2. Click Search. The page should display the correct driver and a Download button.
  3. Click Download.
  4. On the following page, right-click Agree & Download, then copy the link to save the download URL to your clipboard.

CUDA support is made available to Docker containers using nvidia-docker, NVIDIA's custom runtime for Docker. To properly install nvidia-docker, please follow the nvidia-docker installation guide.

If you're using Docker Swarm, you must complete some additional configuration steps:

  1. Because selecting the nvidia runtime is not possible at the moment (this is a limitation of Docker's current Compose format/Swarm engine), you must change the default runtime for all containers, which is the currently accepted workaround.

    For non-Deepgram containers and containers that do not require GPU acceleration, your system and its performance should not be affected.

    To change the default runtime, add --default-runtime=nvidia to the dockerd invocation in your systems service file that manages dockerd (e.g., for Ubuntu 18.04, /etc/systemd/system/docker.service).

    Typical best-practice for systemd is to create an override file, such as /etc/systemd/system/docker.service.d/override.conf, rather than editing the primary docker.service file directly.

  2. You must tell Docker that GPU resources exist. Again, this involves modifying the Docker Daemon system service definition. First, on each host, enumerate the GPU UUIDs:

    nvidia-smi -a | grep UUID | awk '{print substr($4,0,12)}'
    

    Then edit each host's service definition for dockerd to add --node-generic-resource gpu=UUID for each UUID returned by the previous command to the dockerd invocation. If you have multiple GPUs, include this switch multiple times.

  3. You must tell the nvidia-docker runtime to advertise GPU resources to the swarm. To do this, edit the /etc/nvidia-container-runtime/config.toml file and ensure that this line exists near the top of the file:

    swarm-resource = "DOCKER_RESOURCE_GPU"
    
  4. After performing any of the above steps, restart the Docker daemon.

If you will be using Docker Compose, you may still change the Docker default runtime as described above. This has the advantage of allowing you to use version 3 Docker Compose files, which are also compatible with Docker Swarm.

Distributed Filesystem

Primarily for Docker Swarm users.

If you will be operating Deepgram's products in a distributed environment (e.g., Docker Swarm), then you must ensure that required data artifacts and configuration files are available to each Docker container. In non-Swarm environments, this is often solved by Docker volumes, but in a distributed environment, you will have no control over where each container runs, so the artifacts and configuration files must be available on all hosts.

To do this, you can copy the files to all hosts (e.g., rsync), but a more reliable and convenient solution is to use a distributed file system, such as NFS or GlusterFS.

DGTools

Before installing DGTools, you should have the following:

  • Unix operating system
  • NVIDIA GPU(s). To learn more about recommended hardware for training, see Recommended Hardware.
  • CUDA support (nvidia-docker)
  • Access to deepgram/onprem Docker image