Requesting SageMaker Quota
AWS enforces default service quotas on the number of SageMaker endpoint instances you can run per account per region. Before you can deploy Deepgram on Amazon SageMaker, you may need to request a quota increase for the GPU-accelerated instance types that Deepgram requires.
Instance types used by Deepgram
Deepgram SageMaker deployments use the following instance types. Each maps to a separate service quota.
Quota values represent the maximum number of instances of that type you can run simultaneously across all SageMaker endpoints in a single AWS region. A quota of 0 means you cannot deploy that instance type until you request an increase.
Check your current quota
Before requesting an increase, check the quota you already have.
AWS Management Console
AWS CLI
Request a quota increase
If your current quota is 0 or too low for your deployment, submit a quota increase request.
AWS Management Console
AWS CLI
Open the quota detail page
In the Service Quotas console for Amazon SageMaker, search for the instance type (for example, g5.2xlarge) and select the quota named ml.g5.2xlarge for endpoint usage.
Enter the new quota value
In the Increase quota value field, enter the number of instances you need. For example, enter 4 if you plan to run up to four ml.g5.2xlarge endpoint instances in this region.
Choosing the right quota value
The quota value determines how many instances of that type you can run simultaneously. Consider the following when choosing a value:
- Number of Deepgram products — each product (for example, English STT, Spanish STT, TTS) runs as a separate SageMaker endpoint.
- Auto-scaling — if you configure auto-scaling, set the quota high enough to accommodate the maximum instance count across all endpoints.
- Multi-region deployments — quotas are per region. Request increases in every region where you plan to deploy.
Do not request more instances than you plan to use. Running SageMaker endpoint instances incurs charges for as long as they remain active.