Deploy with Terraform
This guide provides a complete Terraform configuration for deploying Deepgram on Amazon SageMaker. The configuration creates an IAM execution role, a SageMaker Model from your AWS Marketplace subscription, an Endpoint Configuration, and a live Endpoint. An optional module adds auto-scaling. The same configuration can deploy either a real-time endpoint (the default) or an asynchronous endpoint that processes large pre-recorded files from S3 and can scale to zero — set enable_async_inference = true.
Before running Terraform, you must subscribe to a Deepgram product on the AWS Marketplace and note the Model Package ARN. See Subscribe to Deepgram Products for instructions.
Prerequisites
- Terraform 1.5 or later
- AWS credentials configured for the target account (via environment variables, shared credentials file, or an IAM role)
- An active AWS Marketplace subscription to a Deepgram SageMaker product
- The Model Package ARN for the subscribed product (found in the SageMaker console under Marketplace Model Packages → AWS Marketplace Subscriptions)
Project layout
Variables
Create variables.tf with the input variables the configuration needs. The only required value is the Model Package ARN from your Marketplace subscription.
In async mode,
autoscaling_target_valueis interpreted as the targetApproximateBacklogSizePerInstance(queued requests per instance), andautoscaling_min_capacitymay be set to0to enable scale-to-zero. In real-time mode it remains concurrent-requests-per-instance with a minimum of 1.
Main configuration
Create main.tf with the provider, IAM role, and SageMaker resources. The configuration uses the Model Package ARN from your AWS Marketplace subscription to create the model without referencing a container image directly.
Outputs
Create outputs.tf to surface the endpoint details after terraform apply completes.
Example variable values
Create a terraform.tfvars file with your specific values. Replace the model_package_arn with the ARN from your AWS Marketplace subscription.
Do not commit terraform.tfvars to version control if it contains sensitive values. Add it to .gitignore or use environment variables instead.
Deploy
Preview the resources Terraform will create
Verify the plan shows the expected resources: an IAM role, a SageMaker Model, an Endpoint Configuration, and an Endpoint.
Validate the endpoint
After the endpoint reaches InService, run a test inference to confirm it returns results. See Validate a Deepgram SageMaker Endpoint for the full testing guide using the dg-sagemaker test clients.
Customize the deployment
Instance types
Choose an instance type based on the Deepgram product you are deploying. GPU-accelerated instances are required.
For a full list of compatible instances, see the Deployment Environments hardware specifications.
Environment variable overrides
Pass Deepgram configuration overrides through the deepgram_engine_env and deepgram_api_env variables. Each map key becomes the suffix (for example, "01", "02"), and the value is the TOML expression. See Configure Amazon SageMaker Deployments for the full reference.
VPC configuration
To deploy the endpoint inside a VPC, add a vpc_config block to the aws_sagemaker_model resource:
Asynchronous endpoints
By default this configuration deploys a real-time endpoint for streaming and synchronous transcription. To instead deploy an asynchronous endpoint — for large pre-recorded files (up to 1 GB), queued processing, and scale-to-zero — set enable_async_inference = true and provide an async_s3_output_path.
Asynchronous inference is a distinct endpoint mode: an async endpoint accepts only asynchronous invocations (InvokeEndpointAsync with S3 input/output) and cannot serve streaming or synchronous requests. Switching enable_async_inference replaces the endpoint configuration and endpoint.
When async is enabled, the configuration also:
- grants the execution role
s3:GetObject,s3:PutObject, ands3:ListBucketon the output and failure buckets; - switches the autoscaling target metric to
ApproximateBacklogSizePerInstanceand allowsautoscaling_min_capacity = 0for scale-to-zero; - adds a scale-from-zero policy so the endpoint wakes on the first queued request instead of waiting for the backlog to exceed the target value.
For invocation details, see Deploy Deepgram on Amazon SageMaker. For autoscaling details, see Auto-Scaling Asynchronous Endpoints.
Tear down
To delete all resources created by this configuration:
This removes the SageMaker Endpoint, Endpoint Configuration, Model, auto-scaling resources (if enabled), and the IAM execution role. You are no longer billed for SageMaker compute after the endpoint is deleted. Your AWS Marketplace subscription remains active.