Working With Concurrency Rate Limits

Overview

Deepgram’s APIs offer powerful capabilities for speech-to-text and text-to-speech as well as audio intelligence and text intelligence. To ensure all users receive consistent and predictable service, and to prevent accidental misuse, Deepgram enforces concurrency and rate limits on API usage. This guide will help you understand these limits, how to work within them, and strategies to maximize your usage without hitting your limits.

API concurrency rate limits protect your applications and services built on Deepgram from abuse or failure.

Working with Concurrency Rate Limits

API Concurrency limits are associated to your Deepgram project, not your API key. They define the maximum number of simultaneous API requests allowed at any given time. Understanding these limits is essential for building high-throughput applications and avoiding request throttling.

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Managing Concurrency

When using the Deepgram Speech-to-Text API, you might need to process multiple audio files simultaneously.

In this scenario, you must ensure your application doesn’t exceed the concurrency limits. Implementing a queue system can help manage requests efficiently, ensuring no more than the allowed number of concurrent requests are sent to the Deepgram API.

Considering Concurrency Rate Limits

Concurrency rate limits define the maximum number of API requests you can make in a given time frame. These limits help maintain service quality and prevent abuse. Deepgram doesn’t restrict the number of requests you can send in a given time span, only the number of concurrent requests you can make.

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Handling Rate Limits

To avoid hitting rate limits, consider the following strategies:

Rate Limiting Middleware: Implement middleware in your application to throttle requests, ensuring they do not exceed the allowed rate.
Exponential Backoff: Use an exponential backoff strategy to retry requests after hitting a rate limit, gradually increasing the delay between retries. An exponential-backoff strategy introduces a delay before each retry attempt. The delay between retries increases exponentially with each consecutive failure, helping to mitigate the impact of transient errors and reduce the likelihood of overwhelming the system with retries.
Monitor Usage: Regularly monitor your API usage to anticipate and adjust to usage patterns.

Other Considerations

What Happens if You Hit Your Concurrency Rate Limits?

If you exceed your rate limits, the API will return a 429: Too Many Requests error. This error indicates that your project has more concurrent requests than allowed. To learn more, see the Deepgram Error Documentation.

Can You Increase Your Concurrency Rate Limits?

Users on Pay As You Go and Growth plans cannot have their limits increased.
New and existing Enterprise customers can request an increase by discussing your needs with the Deepgram Sales Team.

Getting Help

If you encounter issues with concurrency or rate limits, Deepgram offers several support options:

Community Support: Join the Deepgram Discord community or participate in GitHub Discussions for assistance.
Enterprise Support: Enterprise plan subscribers can reach out to the support team via email or Slack for dedicated support.

What’s Next

API Rate Limits