Deploy with Terraform | Deepgram's Docs

This guide provides a complete Terraform configuration for deploying Deepgram on Amazon SageMaker. The configuration creates an IAM execution role, a SageMaker Model from your AWS Marketplace subscription, an Endpoint Configuration, and a live Endpoint. An optional module adds auto-scaling based on the ConcurrentRequestsPerModel metric.

Before running Terraform, you must subscribe to a Deepgram product on the AWS Marketplace and note the Model Package ARN. See Subscribe to Deepgram Products for instructions.

Prerequisites

Terraform 1.5 or later
AWS credentials configured for the target account (via environment variables, shared credentials file, or an IAM role)
An active AWS Marketplace subscription to a Deepgram SageMaker product
The Model Package ARN for the subscribed product (found in the SageMaker console under Marketplace Model Packages → AWS Marketplace Subscriptions)

Project layout

deepgram-sagemaker-terraform/
├── main.tf           # Provider and resource definitions
├── variables.tf      # Input variables
├── outputs.tf        # Endpoint name, ARN, and status outputs
└── terraform.tfvars  # Your variable values (do not commit secrets)

Variables

Create variables.tf with the input variables the configuration needs. The only required value is the Model Package ARN from your Marketplace subscription.

variables.tf

1 variable "aws_region" {
2   description = "AWS region where the SageMaker Endpoint will be deployed."
3   type        = string
4   default     = "us-east-1"
5 }
6 
7 variable "model_package_arn" {
8   description = "ARN of the Deepgram Model Package from AWS Marketplace."
9   type        = string
10 }
11 
12 variable "model_name" {
13   description = "Name for the SageMaker Model resource."
14   type        = string
15   default     = "deepgram-stt"
16 }
17 
18 variable "endpoint_name" {
19   description = "Name for the SageMaker Endpoint."
20   type        = string
21   default     = "deepgram-stt-endpoint"
22 }
23 
24 variable "instance_type" {
25   description = "SageMaker instance type for the endpoint."
26   type        = string
27   default     = "ml.g5.2xlarge"
28 }
29 
30 variable "initial_instance_count" {
31   description = "Number of instances to launch at endpoint creation."
32   type        = number
33   default     = 1
34 }
35 
36 variable "variant_name" {
37   description = "Name of the production variant."
38   type        = string
39   default     = "AllTraffic"
40 }
41 
42 variable "deepgram_engine_env" {
43   description = "Map of DEEPGRAM_ENGINE_* environment variables for TOML overrides."
44   type        = map(string)
45   default     = {}
46 }
47 
48 variable "deepgram_api_env" {
49   description = "Map of DEEPGRAM_API_* environment variables for TOML overrides."
50   type        = map(string)
51   default     = {}
52 }
53 
54 variable "enable_autoscaling" {
55   description = "Enable auto-scaling for the endpoint."
56   type        = bool
57   default     = false
58 }
59 
60 variable "autoscaling_min_capacity" {
61   description = "Minimum instance count for auto-scaling."
62   type        = number
63   default     = 1
64 }
65 
66 variable "autoscaling_max_capacity" {
67   description = "Maximum instance count for auto-scaling."
68   type        = number
69   default     = 4
70 }
71 
72 variable "autoscaling_target_value" {
73   description = "Target concurrent requests per instance for the scaling policy."
74   type        = number
75   default     = 5.0
76 }

Main configuration

Create main.tf with the provider, IAM role, and SageMaker resources. The configuration uses the Model Package ARN from your AWS Marketplace subscription to create the model without referencing a container image directly.

main.tf

1 ###############################################################################
2 # Provider
3 ###############################################################################
4 
5 terraform {
6   required_version = ">= 1.5"
7 
8   required_providers {
9     aws = {
10       source  = "hashicorp/aws"
11       version = ">= 5.0"
12     }
13   }
14 }
15 
16 provider "aws" {
17   region = var.aws_region
18 }
19 
20 ###############################################################################
21 # IAM Role — SageMaker Execution
22 ###############################################################################
23 
24 data "aws_iam_policy_document" "sagemaker_assume_role" {
25   statement {
26     actions = ["sts:AssumeRole"]
27 
28     principals {
29       type        = "Service"
30       identifiers = ["sagemaker.amazonaws.com"]
31     }
32   }
33 }
34 
35 resource "aws_iam_role" "sagemaker_execution" {
36   name               = "${var.model_name}-execution-role"
37   assume_role_policy = data.aws_iam_policy_document.sagemaker_assume_role.json
38 }
39 
40 resource "aws_iam_role_policy_attachment" "sagemaker_full_access" {
41   role       = aws_iam_role.sagemaker_execution.name
42   policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
43 }
44 
45 ###############################################################################
46 # Merge Deepgram environment variables
47 ###############################################################################
48 
49 locals {
50   deepgram_env = merge(
51     { for k, v in var.deepgram_engine_env : "DEEPGRAM_ENGINE_${k}" => v },
52     { for k, v in var.deepgram_api_env : "DEEPGRAM_API_${k}" => v },
53   )
54 }
55 
56 ###############################################################################
57 # SageMaker Model — from AWS Marketplace Model Package
58 ###############################################################################
59 
60 resource "aws_sagemaker_model" "deepgram" {
61   name                     = var.model_name
62   execution_role_arn       = aws_iam_role.sagemaker_execution.arn
63   enable_network_isolation = true
64 
65   primary_container {
66     model_package_name = var.model_package_arn
67     environment        = local.deepgram_env
68   }
69 }
70 
71 ###############################################################################
72 # SageMaker Endpoint Configuration
73 ###############################################################################
74 
75 resource "aws_sagemaker_endpoint_configuration" "deepgram" {
76   name = "${var.endpoint_name}-config"
77 
78   production_variants {
79     variant_name           = var.variant_name
80     model_name             = aws_sagemaker_model.deepgram.name
81     initial_instance_count = var.initial_instance_count
82     instance_type          = var.instance_type
83   }
84 }
85 
86 ###############################################################################
87 # SageMaker Endpoint
88 ###############################################################################
89 
90 resource "aws_sagemaker_endpoint" "deepgram" {
91   name                 = var.endpoint_name
92   endpoint_config_name = aws_sagemaker_endpoint_configuration.deepgram.name
93 }
94 
95 ###############################################################################
96 # Auto-Scaling (optional)
97 ###############################################################################
98 
99 resource "aws_appautoscaling_target" "sagemaker" {
100   count = var.enable_autoscaling ? 1 : 0
101 
102   max_capacity       = var.autoscaling_max_capacity
103   min_capacity       = var.autoscaling_min_capacity
104   resource_id        = "endpoint/${aws_sagemaker_endpoint.deepgram.name}/variant/${var.variant_name}"
105   scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
106   service_namespace  = "sagemaker"
107 }
108 
109 resource "aws_appautoscaling_policy" "sagemaker" {
110   count = var.enable_autoscaling ? 1 : 0
111 
112   name               = "${var.endpoint_name}-concurrency-policy"
113   policy_type        = "TargetTrackingScaling"
114   resource_id        = aws_appautoscaling_target.sagemaker[0].resource_id
115   scalable_dimension = aws_appautoscaling_target.sagemaker[0].scalable_dimension
116   service_namespace  = aws_appautoscaling_target.sagemaker[0].service_namespace
117 
118   target_tracking_scaling_policy_configuration {
119     target_value = var.autoscaling_target_value
120 
121     predefined_metric_specification {
122       predefined_metric_type = "SageMakerVariantConcurrentRequestsPerModelHighResolution"
123     }
124 
125     scale_in_cooldown  = 300
126     scale_out_cooldown = 60
127   }
128 }

Outputs

Create outputs.tf to surface the endpoint details after terraform apply completes.

outputs.tf

1 output "endpoint_name" {
2   description = "Name of the deployed SageMaker Endpoint."
3   value       = aws_sagemaker_endpoint.deepgram.name
4 }
5 
6 output "endpoint_arn" {
7   description = "ARN of the deployed SageMaker Endpoint."
8   value       = aws_sagemaker_endpoint.deepgram.arn
9 }
10 
11 output "model_name" {
12   description = "Name of the SageMaker Model."
13   value       = aws_sagemaker_model.deepgram.name
14 }
15 
16 output "execution_role_arn" {
17   description = "ARN of the IAM execution role."
18   value       = aws_iam_role.sagemaker_execution.arn
19 }

Example variable values

Create a terraform.tfvars file with your specific values. Replace the model_package_arn with the ARN from your AWS Marketplace subscription.

terraform.tfvars

1 aws_region        = "us-east-1"
2 model_package_arn = "arn:aws:sagemaker:us-east-1:123456789012:model-package/deepgram-stt-nova-3/1"
3 model_name        = "deepgram-streaming-stt"
4 endpoint_name     = "my-deepgram-stt"
5 instance_type     = "ml.g5.2xlarge"
6 
7 # Optional: Deepgram configuration overrides
8 deepgram_engine_env = {
9   "01" = "max_active_requests=120"
10 }
11 deepgram_api_env = {
12   "01" = "features.listen_v2=true"
13 }
14 
15 # Optional: Enable auto-scaling
16 enable_autoscaling       = true
17 autoscaling_min_capacity = 1
18 autoscaling_max_capacity = 4
19 autoscaling_target_value = 5.0

Do not commit terraform.tfvars to version control if it contains sensitive values. Add it to .gitignore or use environment variables instead.

Deploy

Initialize the Terraform working directory

$ terraform init

Preview the resources Terraform will create

$ terraform plan

Verify the plan shows the expected resources: an IAM role, a SageMaker Model, an Endpoint Configuration, and an Endpoint.

Apply the configuration

$ terraform apply

Terraform creates the resources and waits for the SageMaker Endpoint to reach InService status. This typically takes several minutes.

Verify the endpoint

Confirm the endpoint is running:

$ aws sagemaker describe-endpoint \
>   --endpoint-name $(terraform output -raw endpoint_name) \
>   --region $(terraform output -raw aws_region 2>/dev/null || echo "us-east-1") \
>   --query "EndpointStatus"

The output should be "InService".

Validate the endpoint

After the endpoint reaches InService, run a test inference to confirm it returns results. See Validate a Deepgram SageMaker Endpoint for the full testing guide using the dg-sagemaker test clients.

Customize the deployment

Instance types

Choose an instance type based on the Deepgram product you are deploying. GPU-accelerated instances are required.

Product	Recommended instance type	Notes
Speech-to-Text (Nova-3, Flux)	`ml.g5.2xlarge`	Single NVIDIA A10G GPU, 32 GB GPU RAM
Text-to-Speech (Aura)	`ml.g5.12xlarge`	4 NVIDIA A10G GPUs (TTS requires 2+ GPUs)

For a full list of compatible instances, see the Deployment Environments hardware specifications.

Environment variable overrides

Pass Deepgram configuration overrides through the deepgram_engine_env and deepgram_api_env variables. Each map key becomes the suffix (for example, "01", "02"), and the value is the TOML expression. See Configure Amazon SageMaker Deployments for the full reference.

1 deepgram_engine_env = {
2   "01" = "flux.max_streams=25"
3   "02" = "chunking.streaming.step=0.5"
4 }

VPC configuration

To deploy the endpoint inside a VPC, add a vpc_config block to the aws_sagemaker_model resource:

VPC configuration

1 resource "aws_sagemaker_model" "deepgram" {
2   name                     = var.model_name
3   execution_role_arn       = aws_iam_role.sagemaker_execution.arn
4   enable_network_isolation = true
5 
6   primary_container {
7     model_package_name = var.model_package_arn
8     environment        = local.deepgram_env
9   }
10 
11   vpc_config {
12     subnets            = ["subnet-0123456789abcdef0", "subnet-0123456789abcdef1"]
13     security_group_ids = ["sg-0123456789abcdef0"]
14   }
15 }

Tear down

To delete all resources created by this configuration:

$ terraform destroy

This removes the SageMaker Endpoint, Endpoint Configuration, Model, auto-scaling resources (if enabled), and the IAM execution role. You are no longer billed for SageMaker compute after the endpoint is deleted. Your AWS Marketplace subscription remains active.

Prerequisites

Project layout

Variables

Main configuration

Outputs

Example variable values

Deploy

Initialize the Terraform working directory

Preview the resources Terraform will create

Apply the configuration

Verify the endpoint

Validate the endpoint

Customize the deployment

Instance types

Environment variable overrides

VPC configuration

Tear down

Related resources