Deploy with Terraform

Use Terraform to deploy Deepgram on Amazon SageMaker from an AWS Marketplace Model Package subscription.

This guide provides a complete Terraform configuration for deploying Deepgram on Amazon SageMaker. The configuration creates an IAM execution role, a SageMaker Model from your AWS Marketplace subscription, an Endpoint Configuration, and a live Endpoint. An optional module adds auto-scaling based on the ConcurrentRequestsPerModel metric.

Before running Terraform, you must subscribe to a Deepgram product on the AWS Marketplace and note the Model Package ARN. See Subscribe to Deepgram Products for instructions.

Prerequisites

  • Terraform 1.5 or later
  • AWS credentials configured for the target account (via environment variables, shared credentials file, or an IAM role)
  • An active AWS Marketplace subscription to a Deepgram SageMaker product
  • The Model Package ARN for the subscribed product (found in the SageMaker console under Marketplace Model PackagesAWS Marketplace Subscriptions)

Project layout

deepgram-sagemaker-terraform/
├── main.tf # Provider and resource definitions
├── variables.tf # Input variables
├── outputs.tf # Endpoint name, ARN, and status outputs
└── terraform.tfvars # Your variable values (do not commit secrets)

Variables

Create variables.tf with the input variables the configuration needs. The only required value is the Model Package ARN from your Marketplace subscription.

variables.tf
1variable "aws_region" {
2 description = "AWS region where the SageMaker Endpoint will be deployed."
3 type = string
4 default = "us-east-1"
5}
6
7variable "model_package_arn" {
8 description = "ARN of the Deepgram Model Package from AWS Marketplace."
9 type = string
10}
11
12variable "model_name" {
13 description = "Name for the SageMaker Model resource."
14 type = string
15 default = "deepgram-stt"
16}
17
18variable "endpoint_name" {
19 description = "Name for the SageMaker Endpoint."
20 type = string
21 default = "deepgram-stt-endpoint"
22}
23
24variable "instance_type" {
25 description = "SageMaker instance type for the endpoint."
26 type = string
27 default = "ml.g5.2xlarge"
28}
29
30variable "initial_instance_count" {
31 description = "Number of instances to launch at endpoint creation."
32 type = number
33 default = 1
34}
35
36variable "variant_name" {
37 description = "Name of the production variant."
38 type = string
39 default = "AllTraffic"
40}
41
42variable "deepgram_engine_env" {
43 description = "Map of DEEPGRAM_ENGINE_* environment variables for TOML overrides."
44 type = map(string)
45 default = {}
46}
47
48variable "deepgram_api_env" {
49 description = "Map of DEEPGRAM_API_* environment variables for TOML overrides."
50 type = map(string)
51 default = {}
52}
53
54variable "enable_autoscaling" {
55 description = "Enable auto-scaling for the endpoint."
56 type = bool
57 default = false
58}
59
60variable "autoscaling_min_capacity" {
61 description = "Minimum instance count for auto-scaling."
62 type = number
63 default = 1
64}
65
66variable "autoscaling_max_capacity" {
67 description = "Maximum instance count for auto-scaling."
68 type = number
69 default = 4
70}
71
72variable "autoscaling_target_value" {
73 description = "Target concurrent requests per instance for the scaling policy."
74 type = number
75 default = 5.0
76}

Main configuration

Create main.tf with the provider, IAM role, and SageMaker resources. The configuration uses the Model Package ARN from your AWS Marketplace subscription to create the model without referencing a container image directly.

main.tf
1###############################################################################
2# Provider
3###############################################################################
4
5terraform {
6 required_version = ">= 1.5"
7
8 required_providers {
9 aws = {
10 source = "hashicorp/aws"
11 version = ">= 5.0"
12 }
13 }
14}
15
16provider "aws" {
17 region = var.aws_region
18}
19
20###############################################################################
21# IAM Role — SageMaker Execution
22###############################################################################
23
24data "aws_iam_policy_document" "sagemaker_assume_role" {
25 statement {
26 actions = ["sts:AssumeRole"]
27
28 principals {
29 type = "Service"
30 identifiers = ["sagemaker.amazonaws.com"]
31 }
32 }
33}
34
35resource "aws_iam_role" "sagemaker_execution" {
36 name = "${var.model_name}-execution-role"
37 assume_role_policy = data.aws_iam_policy_document.sagemaker_assume_role.json
38}
39
40resource "aws_iam_role_policy_attachment" "sagemaker_full_access" {
41 role = aws_iam_role.sagemaker_execution.name
42 policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
43}
44
45###############################################################################
46# Merge Deepgram environment variables
47###############################################################################
48
49locals {
50 deepgram_env = merge(
51 { for k, v in var.deepgram_engine_env : "DEEPGRAM_ENGINE_${k}" => v },
52 { for k, v in var.deepgram_api_env : "DEEPGRAM_API_${k}" => v },
53 )
54}
55
56###############################################################################
57# SageMaker Model — from AWS Marketplace Model Package
58###############################################################################
59
60resource "aws_sagemaker_model" "deepgram" {
61 name = var.model_name
62 execution_role_arn = aws_iam_role.sagemaker_execution.arn
63 enable_network_isolation = true
64
65 primary_container {
66 model_package_name = var.model_package_arn
67 environment = local.deepgram_env
68 }
69}
70
71###############################################################################
72# SageMaker Endpoint Configuration
73###############################################################################
74
75resource "aws_sagemaker_endpoint_configuration" "deepgram" {
76 name = "${var.endpoint_name}-config"
77
78 production_variants {
79 variant_name = var.variant_name
80 model_name = aws_sagemaker_model.deepgram.name
81 initial_instance_count = var.initial_instance_count
82 instance_type = var.instance_type
83 }
84}
85
86###############################################################################
87# SageMaker Endpoint
88###############################################################################
89
90resource "aws_sagemaker_endpoint" "deepgram" {
91 name = var.endpoint_name
92 endpoint_config_name = aws_sagemaker_endpoint_configuration.deepgram.name
93}
94
95###############################################################################
96# Auto-Scaling (optional)
97###############################################################################
98
99resource "aws_appautoscaling_target" "sagemaker" {
100 count = var.enable_autoscaling ? 1 : 0
101
102 max_capacity = var.autoscaling_max_capacity
103 min_capacity = var.autoscaling_min_capacity
104 resource_id = "endpoint/${aws_sagemaker_endpoint.deepgram.name}/variant/${var.variant_name}"
105 scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
106 service_namespace = "sagemaker"
107}
108
109resource "aws_appautoscaling_policy" "sagemaker" {
110 count = var.enable_autoscaling ? 1 : 0
111
112 name = "${var.endpoint_name}-concurrency-policy"
113 policy_type = "TargetTrackingScaling"
114 resource_id = aws_appautoscaling_target.sagemaker[0].resource_id
115 scalable_dimension = aws_appautoscaling_target.sagemaker[0].scalable_dimension
116 service_namespace = aws_appautoscaling_target.sagemaker[0].service_namespace
117
118 target_tracking_scaling_policy_configuration {
119 target_value = var.autoscaling_target_value
120
121 predefined_metric_specification {
122 predefined_metric_type = "SageMakerVariantConcurrentRequestsPerModelHighResolution"
123 }
124
125 scale_in_cooldown = 300
126 scale_out_cooldown = 60
127 }
128}

Outputs

Create outputs.tf to surface the endpoint details after terraform apply completes.

outputs.tf
1output "endpoint_name" {
2 description = "Name of the deployed SageMaker Endpoint."
3 value = aws_sagemaker_endpoint.deepgram.name
4}
5
6output "endpoint_arn" {
7 description = "ARN of the deployed SageMaker Endpoint."
8 value = aws_sagemaker_endpoint.deepgram.arn
9}
10
11output "model_name" {
12 description = "Name of the SageMaker Model."
13 value = aws_sagemaker_model.deepgram.name
14}
15
16output "execution_role_arn" {
17 description = "ARN of the IAM execution role."
18 value = aws_iam_role.sagemaker_execution.arn
19}

Example variable values

Create a terraform.tfvars file with your specific values. Replace the model_package_arn with the ARN from your AWS Marketplace subscription.

terraform.tfvars
1aws_region = "us-east-1"
2model_package_arn = "arn:aws:sagemaker:us-east-1:123456789012:model-package/deepgram-stt-nova-3/1"
3model_name = "deepgram-streaming-stt"
4endpoint_name = "my-deepgram-stt"
5instance_type = "ml.g5.2xlarge"
6
7# Optional: Deepgram configuration overrides
8deepgram_engine_env = {
9 "01" = "max_active_requests=120"
10}
11deepgram_api_env = {
12 "01" = "features.listen_v2=true"
13}
14
15# Optional: Enable auto-scaling
16enable_autoscaling = true
17autoscaling_min_capacity = 1
18autoscaling_max_capacity = 4
19autoscaling_target_value = 5.0

Do not commit terraform.tfvars to version control if it contains sensitive values. Add it to .gitignore or use environment variables instead.

Deploy

1

Initialize the Terraform working directory

$terraform init
2

Preview the resources Terraform will create

$terraform plan

Verify the plan shows the expected resources: an IAM role, a SageMaker Model, an Endpoint Configuration, and an Endpoint.

3

Apply the configuration

$terraform apply

Terraform creates the resources and waits for the SageMaker Endpoint to reach InService status. This typically takes several minutes.

4

Verify the endpoint

Confirm the endpoint is running:

$aws sagemaker describe-endpoint \
> --endpoint-name $(terraform output -raw endpoint_name) \
> --region $(terraform output -raw aws_region 2>/dev/null || echo "us-east-1") \
> --query "EndpointStatus"

The output should be "InService".

Validate the endpoint

After the endpoint reaches InService, run a test inference to confirm it returns results. See Validate a Deepgram SageMaker Endpoint for the full testing guide using the dg-sagemaker test clients.

Customize the deployment

Instance types

Choose an instance type based on the Deepgram product you are deploying. GPU-accelerated instances are required.

ProductRecommended instance typeNotes
Speech-to-Text (Nova-3, Flux)ml.g5.2xlargeSingle NVIDIA A10G GPU, 32 GB GPU RAM
Text-to-Speech (Aura)ml.g5.12xlarge4 NVIDIA A10G GPUs (TTS requires 2+ GPUs)

For a full list of compatible instances, see the Deployment Environments hardware specifications.

Environment variable overrides

Pass Deepgram configuration overrides through the deepgram_engine_env and deepgram_api_env variables. Each map key becomes the suffix (for example, "01", "02"), and the value is the TOML expression. See Configure Amazon SageMaker Deployments for the full reference.

1deepgram_engine_env = {
2 "01" = "flux.max_streams=25"
3 "02" = "chunking.streaming.step=0.5"
4}

VPC configuration

To deploy the endpoint inside a VPC, add a vpc_config block to the aws_sagemaker_model resource:

VPC configuration
1resource "aws_sagemaker_model" "deepgram" {
2 name = var.model_name
3 execution_role_arn = aws_iam_role.sagemaker_execution.arn
4 enable_network_isolation = true
5
6 primary_container {
7 model_package_name = var.model_package_arn
8 environment = local.deepgram_env
9 }
10
11 vpc_config {
12 subnets = ["subnet-0123456789abcdef0", "subnet-0123456789abcdef1"]
13 security_group_ids = ["sg-0123456789abcdef0"]
14 }
15}

Tear down

To delete all resources created by this configuration:

$terraform destroy

This removes the SageMaker Endpoint, Endpoint Configuration, Model, auto-scaling resources (if enabled), and the IAM execution role. You are no longer billed for SageMaker compute after the endpoint is deleted. Your AWS Marketplace subscription remains active.