With Modal, hardware resources and autoscaling configuration are specified alongside your application code. Update the paraameters in this section by editing the values in app.py and redeploying.
When you clone the repo, the values are configured for an STT deployment in us-west.
For Deepgram’s hardware minimums, see Deployment Environments → Engine.
For Modal’s GPU options, see Modal: GPU.
Modal automatically scales the number of Deepgram containers up and down based on per-container concurrency.
See their Scaling Out guide and Input Conccurrency guide for the different parameters and their functionality. Note that not all available parameters are surfaced in app.py.
min_containers = 1.http_server only accept a value for target_inputs and not max_inputs. This number should be set slightly below the active request limit in your engine.toml file (see Auto-Scaling: Enforcing Limits).To optimize network latency, you will likely want to set the PROXY_REGION AND SERVER_REGION and route traffic from clients in those regions to that deployment.
PROXY_REGION specifies the location of the Modal proxy that routes requests to containers. It can take one of four values: us-east, us-west, eu-west, ap-south.SERVER_REGION specifies which region(s) the server containers can reside in. See the Modal Region Selection doc for more information.