Amazon SageMaker | Deepgram's Docs

Amazon SageMaker is a managed cloud platform from Amazon Web Services (AWS) that enables deployment of Deepgram as a managed, container-based service. The endpoint is air-gapped and runs on compute inside your own AWS VPC. Once you deploy Deepgram as a SageMaker Model Endpoint, you can run inference against the service using the Amazon SageMaker AI Software Development Kit (SDK).

The Deepgram SDKs can also target a SageMaker Endpoint through the SageMaker transport, so you can keep the same client-side request and response patterns whether you call the Deepgram-hosted API or your own SageMaker deployment.

Benefits and Tradeoffs

Deepgram on SageMaker is the fastest path to running Deepgram inside your own AWS account. Compared to self-hosting Deepgram on Docker or Kubernetes, SageMaker trades some flexibility for a managed endpoint that AWS operates on your behalf.

When SageMaker is the right fit

Ease of deployment. A ready-to-use endpoint can be created in minutes from the AWS Console or with infrastructure-as-code. There are no container images to mirror, no GPU drivers to install, and no Helm charts to maintain.
Lower management overhead. AWS manages the underlying instances, host OS, container runtime, and model package distribution. You do not need a dedicated platform team to keep the service patched and healthy.
Compliance for regulated workloads. Deepgram runs entirely inside your AWS account and VPC. Audio never leaves your environment, and you inherit the compliance posture of SageMaker AI (HIPAA-eligible, SOC, ISO, PCI, FedRAMP, and others). This makes SageMaker a strong fit for regulated industries that need a private deployment without operating their own Kubernetes platform.
Native integration with AWS services. SageMaker Endpoints integrate out of the box with Amazon CloudWatch (logs and metrics), AWS IAM (authentication and authorization), Amazon VPC (network isolation), AWS PrivateLink, AWS KMS, AWS CloudTrail (audit), and SageMaker auto-scaling. You get production-grade observability and access controls without building them yourself.
AWS Marketplace billing. Deepgram license charges flow through your existing AWS bill, simplifying procurement for teams that already buy through AWS.

While SageMaker covers most production scenarios, AWS imposes a small number of platform-specific constraints — for example, callback URLs and external file URL ingestion are not supported. Review the full list in the Limitations section of the deployment guide before choosing SageMaker.

When Docker or Kubernetes may be a better fit

You need to run Deepgram outside AWS or on bare metal.
You require features that the SageMaker isolation model does not currently support, such as user-defined callback URLs or JSON payloads that reference audio in cloud storage.
You want to run the Deepgram Voice Agent. SageMaker Endpoints cannot invoke Large Language Model (LLM) services, which the Voice Agent requires, so the Voice Agent cannot run inside SageMaker.
You need streaming connections that stay open for longer than 30 minutes. SageMaker Real-Time Inference supports up to 30 minutes of connection time per bidirectional streaming connection.
You need to send more than 25 MB of input data per non-streaming invocation on a real-time endpoint, or more than 1 GB on an asynchronous endpoint. SageMaker enforces a 25 MB maximum payload size for real-time endpoints. For larger files, asynchronous endpoints support payloads up to 1 GB with near real-time latency and can scale to zero when there are no requests to be processed.
You need fine-grained control over the container runtime, networking, or process supervision beyond what SageMaker exposes.

Deployment options

Most customers can stand up a ready-to-use endpoint in minutes through one of two paths:

AWS Console. Subscribe to a Deepgram product on the AWS Marketplace and click through the SageMaker Console to create the endpoint. See Deploy Deepgram on Amazon SageMaker for the step-by-step walkthrough.
Infrastructure-as-Code. Deploy the same model package using Terraform for repeatable, version-controlled rollouts. See Deploy with Terraform.

Pricing

Deepgram on SageMaker is billed per request, at the same rates shown on deepgram.com/pricing. The AWS pricing page may list the dimension inference.count.m.i.c Inference Pricing at a cost of $0.001/request. When the cost of a request exceeds $0.001, Deepgram automatically emits a charge for multiple units for that single request.

Private offers

For larger or longer-term deployments, AWS Marketplace Private Offers are available with negotiated unit economics and committed-use terms. Contact your AWS account team or Deepgram representative to start a Private Offer.

Try before you buy

A 14-day free trial is available with unlimited product usage and zero Deepgram license charges during the trial window. Each trial is available once per AWS account per product. Contact a Deepgram representative if you need additional time for testing.

Infrastructure charges

Infrastructure charges are set by AWS and billed separately from Deepgram license charges. Public pricing for SageMaker Real-Time Inference is available at aws.amazon.com/sagemaker/ai/pricing. Self-service savings may be available on 1-year or 3-year committed usage by purchasing a Machine Learning Savings Plan from AWS. For more information or to discuss additional discounts, contact your AWS sales representative.