Deploy Deepgram on Modal
Deploy Deepgram as a Modal app for serverless, GPU-powered hosting of Speech-to-Text, Text-to-Speech, and Flux.
Modal is a serverless infrastructure platform that makes it easy to serve GPU powered workloads in the cloud. This guide walks through deploying Deepgram as a Modal app.
Once you deploy Deepgram on Modal, clients can use the standard Deepgram REST and WebSocket APIs. Modal will handle autoscaling, load balancing, storing configurations and model weights, observability, and more. For information on Modal’s features and SDK, check out their documentation.
Prerequisites
In order to deploy Deepgram on Modal, you’ll first need to have a self-hosted contract with Deepgram.
Once you have access to self-hosted in the Deepgram Console, you’ll be able to generate an API key and Distribution Credentials, used during the deployment. The Distribution Credentials are used to authenticate to the Deepgram container image registry, with the environment variables specified below.
Additionally, Deepgram will need to generate unique model file links for your contracted Deepgram project ID. Please ask your Account Executive for access to the required Deepgram model weights files for the products you need to deploy, such as Nova-3, Aura-2, and Flux. Please be sure to specify which languages you need for each product as well, and if you intend to use the HTTP or WebSocket streaming APIs.
Deployment Structure
- All Deepgram components (Engine, API, License Proxy) run in a single Modal container and communicate over
localhost. - The API is exposed publicly using Modal’s
http_serverdecorator and routed to containers via a low-latency, regional proxy. - Model weights and Deepgram TOML configs live on Modal Volumes: a fast, remote data store.
The reference repository (modal-deepgram-hosting) ships a single deployment module that can be configured to serve STT, TTS, or Flux.
Quickstart: Speech-to-Text
Get an STT deployment up and running:
-
Create a Modal account, install the Modal CLI, and authenticate.
-
Create the
deepgramModal Secret with yourDEEPGRAM_API_KEY,REGISTRY_USERNAME, andREGISTRY_PASSWORD. You can use the CLI or the Secrets tab of your Modal workspace dashboard. -
Clone the reference repository.
-
Save your Deepgram STT model download links (provided by Deepgram) to
./model-links.txt. -
Set the label for this configuration using the
DEPLOY_LABELenvironment variable. -
Download Deepgram configs and model weights to Modal Volumes. This command will download the most recent configs from the Deepgram self-hosted-resources repo and patch them to communicate over
localhostusing the correct ports. -
Deploy the Modal app.
Testing the deployment
Once your Modal app is deployed and a container is running, run a quick set of checks against the public URL to confirm the API is up, models are loaded, and inference works end to end. The Modal endpoint speaks the standard Deepgram REST and WebSocket APIs, so any Deepgram client or SDK works against it.
Locate your deployment URL
The base URL for the Deepgram endpoint is:
- printed when you run
modal deploy. - visible in the Modal dashboard for the Deepgram Function.
Health checks
GET /v1/status
Confirms the API is up and the Engine is reachable.
A successful response returns 200 with a JSON body indicating both the API and Engine are healthy.
GET /v1/models
Lists the models loaded from /models/{label}/. Use this to confirm prepare_resources populated the right model files.
The response is a JSON array of model metadata. The set of names should match what you put in your model-links file.
Inference
Upload a local WAV file:
Or pass a URL payload — Modal containers have outbound network access, so URL ingestion works without extra configuration:
To use a Deepgram SDK, point the client at the Modal URL by overriding the base URL. See the SDK quickstarts on developers.deepgram.com.
Streaming Inference
The reference repository ships a working WebSocket client at test/load_test_websocket_modal.py. It streams a WAV file at real-time pace and is the fastest way to validate streaming end to end.