For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Introduction
    • Deployment Environments
  • Amazon SageMaker
  • Docker/Podman
    • Drivers and Container Orchestration Tools
  • Kubernetes
    • Securing Your Cluster
    • Troubleshooting
  • Deployment
    • Self Service Licensing & Credentials
    • Deploy STT Services
    • Deploy Flux Model (STT)
    • Deploy TTS Services
    • Deploy Voice Agent
    • Status Endpoint
    • Certificate Status
  • Partner Deployment
      • Deploy Deepgram on Modal
      • Configure Deepgram on Modal
      • Configure Modal Resources
  • Scaling and Deployment Strategies
    • System Maintenance
    • Blue-Green Deployment
    • Auto-Scaling
    • Metrics Guide
    • Ingress Authentication
    • Redact Usage
    • Log Formats
    • Using Private Container Registries
  • Features
    • Smart Formatting
  • Self-Hosted Add Ons
    • License Proxy
    • Prometheus Integration
    • Deepgram UniMRCP Plugin
    • Using SDKs with Self-Hosted
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Configure hardware
  • Configure autoscaling
  • Configure regions
Partner DeploymentModal

Configure Modal Resources

Configure hardware, autoscaling, and region selection for a Deepgram Modal deployment.
Was this page helpful?
Previous

Scaling and Deployment Strategies

Once you have a basic running environment, there are a number of considerations to iterate towards a true production environment. Unlike our hosted offerings, concerns around scaling, maintenance, and security are fully managed by your team. However, we are happy to advise, including the information included in the following guides.
Next
Built with

With Modal, hardware resources and autoscaling configuration are specified alongside your application code. Update the paraameters in this section by editing the values in app.py and redeploying.

When you clone the repo, the values are configured for an STT deployment in us-west.

1# modal_deepgram/app.py
2
3@app.cls(
4 image=engine_base_image.env({"DEPLOY_LABEL": DEPLOY_LABEL}),
5 volumes={
6 MODELS_PATH: models_vol,
7 CACHE_PATH: cache_vol,
8 },
9 gpu="L4",
10 secrets=[modal.Secret.from_name("deepgram")],
11 timeout=30 * MINUTES,
12 cpu=4,
13 memory=32 * 1024, # MB
14 min_containers=1,
15 region="us-west",
16)
17@modal.concurrent(target_inputs=64)
18@modal.experimental.http_server(port=API_PORT, proxy_regions=["us-west"])
19class DeepgramServer(DeepgramServerBase):
20 ...

Configure hardware

For Deepgram’s hardware minimums, see Deployment Environments → Engine.

For Modal’s GPU options, see Modal: GPU.

Configure autoscaling

Modal automatically scales the number of Deepgram containers up and down based on per-container concurrency.

See their Scaling Out guide and Input Conccurrency guide for the different parameters and their functionality. Note that not all available parameters are surfaced in app.py.

Deepgram recommends keeping at least one container active to ensure that lulls in traffic don’t lead to queuing or 503s when scaling back up from zero. In Modal, set min_containers = 1.
Web endpoints served with the http_server only accept a value for target_inputs and not max_inputs. This number should be set slightly below the active request limit in your engine.toml file (see Auto-Scaling: Enforcing Limits).

Configure regions

To optimize network latency, you will likely want to set the PROXY_REGION AND SERVER_REGION and route traffic from clients in those regions to that deployment.

  • PROXY_REGION specifies the location of the Modal proxy that routes requests to containers. It can take one of four values: us-east, us-west, eu-west, ap-south.
  • SERVER_REGION specifies which region(s) the server containers can reside in. See the Modal Region Selection doc for more information.