Metrics Guide
Monitoring system metrics is an important part of maintaining a healthy Deepgram self-hosted deployment. Metrics can also aid in decision making around scaling and performance concerns. To this end, Deepgram services publish a variety of metrics on exposed endpoints that you can query to determine system health.
Each section also contains details on implementing image-specific liveness and readiness probes. These should be implemented when using a container orchestration tool that supports health probes. For example, the deepgram-self-hosted
Helm chart includes these checks by default.
Deepgram API
For self-hosted deployments, the Deepgram API container images expose an endpoint /v1/status
, on port 8080 by default. Querying this endpoint will yield three pieces of information:
- If a successful response is received, the API is alive and listening to messages
- The response body gives a backward-looking indication of
system_health
- The response body indicates how many requests this API instance is processing
Liveness Probe
Use a TCP check on the open port (port 8080 by default).
Readiness Probe
Query the status of the /v1/status/engine
endpoint and check whether it is in a "Connected"
state with at least one Engine container.
Make sure to replace the PORT
placeholder with the port your container is listening on (port 8080 by default).
Model Metadata
Make a GET request to list all models the Engine has loaded, along with their metadata. For broader information, see the Model Metadata endpoint documentation. When pointed at a self-hosted deployment, this endpoint is helpful to confirm that models present in your models directory are being loaded by the container as expected.
Deepgram Engine
The Deepgram Engine container image publishes an extensive set of system metrics. These metrics are configured on a separate endpoint and port than the main service.
Docker/Podman
Choose a host port HOST_PORT
where external queries can be made, and choose a container port CONTAINER_PORT
where Engine can internally publish its metrics. These can be the same port number, since they are binding to different networks (the host network versus the container network)
Port Collision
“Port collision” can occur when you try to bind to the same port from two different services. Since we are binding to both a container port and a host port, we have to be aware of this on two different networks.When selecting a host port, do not use the same port that is used by any other Deepgram service, or any other service running on the host machine. In the default Deepgram docker-compose.yml
file, the API often uses port 8080
, and the License Proxy often uses ports 8443
and 8089
. A common default value for the Engine HOST_PORT
is 9991
.
When selecting a container port, do not select port 8080
, as this is used on the container network to communicate between the Engine and the API. A common default value for the Engine CONTAINER_PORT
is 9991
.
Within your docker-compose.yml
file you must publish the internal container port to the external host port, as shown below. See Published Ports in the official Docker documentation for more details.
To modify the Engine configuration, edit your engine.toml
file to specify the container port to publish metrics to:
Make sure to replace the placeholders HOST_PORT
and CONTAINER_PORT
in both of the above snippets.
Metrics may now be queried from the self-hosted instance on the local host at :HOST_PORT/metrics
.
Kubernetes
The Engine metrics endpoint is exposed by default by the deepgram-self-hosted
Helm chart via a NodePort Service. This service’s name is defined as the .engine.namePrefix
configuration values appended by -metrics
. By default, this service will be named deepgram-engine-metrics
. See the engine.metricsServer.*
configuration values for more options.
When scaling.auto.enabled
is set to true
, the Engine metrics endpoint will be automatically scraped by Prometheus and used for auto-scaling.
Available Metrics
Upon startup of the containers, a limited set of metrics will be available until the first request is made. After the first request is made a complete set of metrics will be available.
Initial Metrics
engine_estimated_stream_capacity
value will increase as you open more streams until you reach the GPU capacity. This means it will start off low and increase as more streams are opened. When engine_estimated_stream_capacity
stops increasing this is when you have reached the GPU Stream capacity
Complete Metrics
Streaming latency metrics
Together, the engine_stream_latency_bucket
metrics may be used to plot a histogram of streaming latency in Grafana, using the PromQL queries below. This set of metrics tracks the latency of each new stream. With this data, distribution and trends in streaming latency may be monitored for each deployment.
Liveness Probe
Use a TCP check on the open port (port 8080 by default).
Readiness Probe
Use a TCP check on the open port (port 8080 by default).
Deepgram License Proxy
For self-hosted deployments, the Deepgram License Proxy container images expose an endpoint /v1/status
, on port 8080 by default. Querying this endpoint will indicate if the license proxy is able to communicate with the Deepgram license server.
Liveness Probe
Use a TCP check on the open status port (port 8080 by default).
Readiness Probe
Query the status of the /v1/status
endpoint and check the connection state.
Make sure to replace the PORT
placeholder with the port your container is listening on (port 8080 by default).
Summary
To access metrics for the API, Engine, and License Proxy containers, run the following CURL request from the same machine the containers are running on. The ports in the commands below are the default port numbers; check your configuration files to see if the port mapping was changed.
- API:
curl "http://localhost:8080/v1/status"
andcurl "http://localhost:8080/v1/status/engine"
- Engine:
curl "http://localhost:9991/metrics"
- License Proxy:
curl "http://localhost:8080/v1/status"
What’s Next
You may want to setup tooling for ingesting and monitoring system metrics, for example, with our Prometheus guide.