Metrics Guide
Monitoring system metrics is an important part of maintaining a healthy Deepgram on-prem deployment. Metrics can also aid in decision making around scaling and performance concerns. To this end, Deepgram services publish a variety of metrics on exposed endpoints that you can query to determine system health.
Deepgram API
For on-prem deployments, the Deepgram API container images expose an endpoint /v1/status
, on port 8080 by default. Querying this endpoint will yield three pieces of information:
- If a successful response is received, the API is alive and listening to messages
- The response body gives a backward-looking indication of
system_health
- The response body indicates how many requests this API instance is processing
Deepgram Engine
The Deepgram Engine container images are capable of publishing an extensive set of system metrics, however, this behavior is not enabled by default. Accessing these metrics requires changing the Engine configuration files, as well as changing how the container engine (e.g. docker
) launches and runs the Engine service.
Choose a host port HOST_PORT
where external queries can be made, and choose a container port CONTAINER_PORT
where Engine can internally publish its metrics. These can be the same port number, since they are binding to different networks (the host network versus the container network)
Port Collision
"Port collision" can occur when you try to bind to the same port from two different services. Since we are binding to both a container port and a host port, we have to be aware of this on two different networks.
When selecting a host port, do not use the same port that is used by any other Deepgram service, or any other service running on the host machine. In the default Deepgram
docker-compose.yml
file, the API often uses port8080
, and the License Proxy often uses ports8443
and8089
. A common default value for the EngineHOST_PORT
is9991
.When selecting a container port, do not select port
8080
, as this is used on the container network to communicate between the Engine and the API. A common default value for the EngineCONTAINER_PORT
is9991
.
Within your docker-compose.yml
file you must publish the internal container port to the external host port, as shown below. See Published Ports in the official Docker documentation for more details.
services:
engine:
# other definitions
ports:
- "HOST_PORT:CONTAINER_PORT"
To modify the Engine configuration, edit your engine.toml
file to specify the container port to publish metrics to:
# To support metrics we need to expose an Engine endpoint
[metrics_server]
host = "0.0.0.0"
port = CONTAINER_PORT
Make sure to replace the placeholders
HOST_PORT
andCONTAINER_PORT
in both of the above snippets.
Metrics may now be queried from the on-prem instance on the local host at :HOST_PORT/metrics
.
Available Metrics
Upon startup of the containers, a limited set of metrics will be available until the first request is made. After the first request is made a complete set of metrics will be available.
Initial Metrics
# HELP engine_estimated_stream_capacity The number of streams the node believes it can serve with acceptable latency.
# TYPE engine_estimated_stream_capacity gauge
engine_estimated_stream_capacity <integer>
Complete Metrics
# HELP engine_active_requests Number of active ASR requests
# TYPE engine_active_requests gauge
engine_active_requests{kind="batch"} <integer>
engine_active_requests{kind="stream"} <integer>
# HELP engine_batch_response_time_seconds Time to process a batch request.
# TYPE engine_batch_response_time_seconds histogram
engine_batch_response_time_seconds_bucket{le="1"} <integer>
engine_batch_response_time_seconds_bucket{le="2.5"} <integer>
engine_batch_response_time_seconds_bucket{le="5"} <integer>
engine_batch_response_time_seconds_bucket{le="10"} <integer>
engine_batch_response_time_seconds_bucket{le="30"} <integer>
engine_batch_response_time_seconds_bucket{le="60"} <integer>
engine_batch_response_time_seconds_bucket{le="+Inf"} <integer>
engine_batch_response_time_seconds_sum <float>
engine_batch_response_time_seconds_count <integer>
# HELP engine_estimated_stream_capacity The number of streams the node believes it can serve with acceptable latency.
# TYPE engine_estimated_stream_capacity gauge
engine_estimated_stream_capacity <integer>
# HELP engine_requests_total Number of ASR requests.
# TYPE engine_requests_total counter
engine_requests_total{kind="batch",response_status="1xx"} <integer>
engine_requests_total{kind="batch",response_status="2xx"} <integer>
engine_requests_total{kind="batch",response_status="3xx"} <integer>
engine_requests_total{kind="batch",response_status="4xx"} <integer>
engine_requests_total{kind="batch",response_status="5xx"} <integer>
engine_requests_total{kind="stream",response_status="1xx"} <integer>
engine_requests_total{kind="stream",response_status="2xx"} <integer>
engine_requests_total{kind="stream",response_status="3xx"} <integer>
engine_requests_total{kind="stream",response_status="4xx"} <integer>
engine_requests_total{kind="stream",response_status="5xx"} <integer>
Deepgram License Proxy
For on-prem deployments, the Deepgram License Proxy container images expose an endpoint /v1/status
, on port 8089 by default. Querying this endpoint will provide statistics indicating if the license proxy is able to communicate with the Deepgram license server.
Updated 11 days ago
You may want to setup tooling for ingesting and monitoring system metrics, for example, with our Prometheus guide.