Metrics Guide

Monitoring system metrics is an important part of maintaining a healthy Deepgram on-prem deployment. Metrics can also aid in decision making around scaling and performance concerns. To this end, Deepgram services publish a variety of metrics on exposed endpoints that you can query to determine system health.

Deepgram API

For on-prem deployments, the Deepgram API container images expose an endpoint /v1/status, on port 8080 by default. Querying this endpoint will yield three pieces of information:

  1. If a successful response is received, the API is alive and listening to messages
  2. The response body gives a backward-looking indication of system_health
  3. The response body indicates how many requests this API instance is processing

In addition, a readiness check can be performed by querying /v1/status/engine.

Deepgram Engine

The Deepgram Engine container image is capable of publishing an extensive set of system metrics, however, this behavior is not enabled by default. Accessing these metrics requires changing the Engine configuration files, as well as changing how the container engine (e.g. docker) launches and runs the Engine service.

Choose a host port HOST_PORT where external queries can be made, and choose a container port CONTAINER_PORT where Engine can internally publish its metrics. These can be the same port number, since they are binding to different networks (the host network versus the container network)

🚧

Port Collision

"Port collision" can occur when you try to bind to the same port from two different services. Since we are binding to both a container port and a host port, we have to be aware of this on two different networks.

When selecting a host port, do not use the same port that is used by any other Deepgram service, or any other service running on the host machine. In the default Deepgram docker-compose.yml file, the API often uses port 8080, and the License Proxy often uses ports 8443 and 8089. A common default value for the Engine HOST_PORT is 9991.

When selecting a container port, do not select port 8080, as this is used on the container network to communicate between the Engine and the API. A common default value for the Engine CONTAINER_PORT is 9991.

Within your docker-compose.yml file you must publish the internal container port to the external host port, as shown below. See Published Ports in the official Docker documentation for more details.

services:
  engine:
    # other definitions
    ports:
      - "HOST_PORT:CONTAINER_PORT"

To modify the Engine configuration, edit your engine.toml file to specify the container port to publish metrics to:

# To support metrics we need to expose an Engine endpoint
[metrics_server]
  host = "0.0.0.0"
  port = CONTAINER_PORT

🖥️

Make sure to replace the placeholders HOST_PORT and CONTAINER_PORT in both of the above snippets.

Metrics may now be queried from the on-prem instance on the local host at :HOST_PORT/metrics.

Available Metrics

Upon startup of the containers, a limited set of metrics will be available until the first request is made. After the first request is made a complete set of metrics will be available.

Initial Metrics

engine_estimated_stream_capacity value will increase as you open more streams until you reach the GPU capacity. This means it will start off low and increase as more streams are opened. When engine_estimated_stream_capacity stops increasing this is when you have reached the GPU Stream capacity

# HELP engine_estimated_stream_capacity The number of streams the node believes it can serve with acceptable latency.
# TYPE engine_estimated_stream_capacity gauge
engine_estimated_stream_capacity <integer>

Complete Metrics

# HELP engine_active_requests Number of active ASR requests
# TYPE engine_active_requests gauge
engine_active_requests{kind="batch"} <integer>
engine_active_requests{kind="stream"} <integer>
# HELP engine_batch_response_time_seconds Time to process a batch request.
# TYPE engine_batch_response_time_seconds histogram
engine_batch_response_time_seconds_bucket{le="1"} <integer>
engine_batch_response_time_seconds_bucket{le="2.5"} <integer>
engine_batch_response_time_seconds_bucket{le="5"} <integer>
engine_batch_response_time_seconds_bucket{le="10"} <integer>
engine_batch_response_time_seconds_bucket{le="30"} <integer>
engine_batch_response_time_seconds_bucket{le="60"} <integer>
engine_batch_response_time_seconds_bucket{le="+Inf"} <integer>
engine_batch_response_time_seconds_sum <float>
engine_batch_response_time_seconds_count <integer>
# HELP engine_estimated_stream_capacity The number of streams the node believes it can serve with acceptable latency.
# TYPE engine_estimated_stream_capacity gauge
engine_estimated_stream_capacity <integer>
# HELP engine_requests_total Number of ASR requests.
# TYPE engine_requests_total counter
engine_requests_total{kind="batch",response_status="1xx"} <integer>
engine_requests_total{kind="batch",response_status="2xx"} <integer>
engine_requests_total{kind="batch",response_status="3xx"} <integer>
engine_requests_total{kind="batch",response_status="4xx"} <integer>
engine_requests_total{kind="batch",response_status="5xx"} <integer>
engine_requests_total{kind="stream",response_status="1xx"} <integer>
engine_requests_total{kind="stream",response_status="2xx"} <integer>
engine_requests_total{kind="stream",response_status="3xx"} <integer>
engine_requests_total{kind="stream",response_status="4xx"} <integer>
engine_requests_total{kind="stream",response_status="5xx"} <integer>

Deepgram License Proxy

For on-prem deployments, the Deepgram License Proxy container images expose an endpoint /v1/status, on port 8089 by default. Querying this endpoint will provide statistics indicating if the license proxy is able to communicate with the Deepgram license server.

Summary

To access metrics for the API, Engine, and License Proxy containers, run the following CURL request from the same machine the containers are running on. The ports in the commands below are the default port numbers; check your configuration files to see if the port mapping was changed.

  • API: curl "http://localhost:8080/v1/status" and curl "http://localhost:8080/v1/status/engine" (readiness check)
  • Engine: curl "http://localhost:9991/metrics"
  • License Proxy: curl "http://localhost:8089/v1/status"

What’s Next

You may want to setup tooling for ingesting and monitoring system metrics, for example, with our Prometheus guide.