Configure Modal Resources

Configure hardware, autoscaling, and region selection for a Deepgram Modal deployment.

With Modal, hardware resources and autoscaling configuration are specified alongside your application code. Update the paraameters in this section by editing the values in app.py and redeploying.

When you clone the repo, the values are configured for an STT deployment in us-west.

1# modal_deepgram/app.py
2
3@app.cls(
4 image=engine_base_image.env({"DEPLOY_LABEL": DEPLOY_LABEL}),
5 volumes={
6 MODELS_PATH: models_vol,
7 CACHE_PATH: cache_vol,
8 },
9 gpu="L4",
10 secrets=[modal.Secret.from_name("deepgram")],
11 timeout=30 * MINUTES,
12 cpu=4,
13 memory=32 * 1024, # MB
14 min_containers=1,
15 region="us-west",
16)
17@modal.concurrent(target_inputs=64)
18@modal.experimental.http_server(port=API_PORT, proxy_regions=["us-west"])
19class DeepgramServer(DeepgramServerBase):
20 ...

Configure hardware

For Deepgram’s hardware minimums, see Deployment Environments → Engine.

For Modal’s GPU options, see Modal: GPU.

Configure autoscaling

Modal automatically scales the number of Deepgram containers up and down based on per-container concurrency.

See their Scaling Out guide and Input Conccurrency guide for the different parameters and their functionality. Note that not all available parameters are surfaced in app.py.

Deepgram recommends keeping at least one container active to ensure that lulls in traffic don’t lead to queuing or 503s when scaling back up from zero. In Modal, set min_containers = 1.
Web endpoints served with the http_server only accept a value for target_inputs and not max_inputs. This number should be set slightly below the active request limit in your engine.toml file (see Auto-Scaling: Enforcing Limits).

Configure regions

To optimize network latency, you will likely want to set the PROXY_REGION AND SERVER_REGION and route traffic from clients in those regions to that deployment.

  • PROXY_REGION specifies the location of the Modal proxy that routes requests to containers. It can take one of four values: us-east, us-west, eu-west, ap-south.
  • SERVER_REGION specifies which region(s) the server containers can reside in. See the Modal Region Selection doc for more information.