Requesting SageMaker Quota

Request additional AWS account quota for SageMaker endpoint instances so you can deploy Deepgram models on g5.2xlarge, g6.2xlarge, and g6e.2xlarge instances.

AWS enforces default service quotas on the number of SageMaker endpoint instances you can run per account per region. Before you can deploy Deepgram on Amazon SageMaker, you may need to request a quota increase for the GPU-accelerated instance types that Deepgram requires.

Instance types used by Deepgram

Deepgram SageMaker deployments use the following instance types. Each maps to a separate service quota.

Instance typeGPUQuota name
ml.g5.2xlargeNVIDIA A10Gml.g5.2xlarge for endpoint usage
ml.g6.2xlargeNVIDIA L4ml.g6.2xlarge for endpoint usage
ml.g6e.2xlargeNVIDIA L40Sml.g6e.2xlarge for endpoint usage

Quota values represent the maximum number of instances of that type you can run simultaneously across all SageMaker endpoints in a single AWS region. A quota of 0 means you cannot deploy that instance type until you request an increase.

Check your current quota

Before requesting an increase, check the quota you already have.

1

Open Service Quotas

Sign in to the AWS Management Console and navigate to Service Quotas.

2

Select Amazon SageMaker

In the left-hand menu, select AWS services, then search for and select Amazon SageMaker.

3

Search for the instance quota

In the search bar, enter the instance type you want to check (for example, g5.2xlarge). Find the item named similar to ml.g5.2xlarge for endpoint usage.

Request a quota increase

If your current quota is 0 or too low for your deployment, submit a quota increase request.

1

Open the quota detail page

In the Service Quotas console for Amazon SageMaker, search for the instance type (for example, g5.2xlarge) and select the quota named ml.g5.2xlarge for endpoint usage.

2

Request an increase

Select Request increase at account level.

3

Enter the new quota value

In the Increase quota value field, enter the number of instances you need. For example, enter 4 if you plan to run up to four ml.g5.2xlarge endpoint instances in this region.

4

Submit the request

Select Request. AWS reviews most SageMaker quota requests within a few hours, though some may take several business days.

5

Repeat for each instance type

If you need quota for additional instance types (ml.g6.2xlarge, ml.g6e.2xlarge), repeat these steps for each.

Choosing the right quota value

The quota value determines how many instances of that type you can run simultaneously. Consider the following when choosing a value:

  • Number of Deepgram products — each product (for example, English STT, Spanish STT, TTS) runs as a separate SageMaker endpoint.
  • Auto-scaling — if you configure auto-scaling, set the quota high enough to accommodate the maximum instance count across all endpoints.
  • Multi-region deployments — quotas are per region. Request increases in every region where you plan to deploy.

Do not request more instances than you plan to use. Running SageMaker endpoint instances incurs charges for as long as they remain active.

Troubleshooting

SymptomCauseResolution
ResourceLimitExceeded when creating an endpointThe instance quota for the selected type is 0 or fully consumedRequest a quota increase for that instance type
Quota request remains PENDING for several daysAWS is reviewing the requestOpen a support case in the AWS Support Center referencing the quota request ID
Quota increase approved but endpoint still failsQuota was increased in a different regionVerify the region in the AWS Management Console matches the region where you are creating the endpoint