Deploy Deepgram on Amazon SageMaker

Deepgram can be deployed into your own Amazon Virtual Private Cloud (VPC) environment using Amazon SageMaker AI. Simply subscribe to the Deepgram product in the AWS Marketplace and then deploy a SageMaker Endpoint, using our pre-made SageMaker Model Package.

Amazon SageMaker is a managed cloud platform from Amazon Web Services (AWS) that enables deployment of Deepgram as a managed, container-based service. Once you deploy Deepgram as a SageMaker Model Endpoint, you can run inference against the service using the Amazon SageMaker AI Software Development Kit (SDK).

Supported Products

The following Deepgram products are supported on the SageMaker AI platform.

Each transcription language model is published as a separate product listing. Please subscribe to and deploy SageMaker Endpoints for each language model that you wish to utilize. Your application code will need to route to your SageMaker Endpoint for the language model you wish to run inference against.

Limitations

When using Deepgram services in Amazon SageMaker, please be aware of the following limitations.

  • Deepgram cannot call Large Langage Model (LLM) services
  • Deepgram cannot invoke user-defined callback URLs

Prerequisites

  • An AWS account
  • AWS IAM permissions to SageMaker and Marketplace

Subscribe to Deepgram Products

Before you can deploy Deepgram on Amazon SageMaker AI, you’ll need to subscribe to the product in the AWS Marketplace. Keep in mind that you are not billed for the product until you deploy an Amazon SageMaker AI Endpoint resource.

1

Login to the AWS Management Console for the account you’d like to deploy in

2

Search for and navigate to the AWS Marketplace console

3

Click on the Discover Products link on the left

4

Under Search AWS Marketplace products, search for “Deepgram

5

Under Refine Results, Delivery Methods, check the SageMaker Model box

6

Click on the Deepgram product you’re interested in deploying (eg. Deepgram Voice AI- English Speech-to-Text (STT))

7

Click on the View Purchase Options button

8

Ensure the Offer Type of Public Offer is selected (if required)

9

Scroll down and click the Subscribe button

Create AWS IAM Role for SageMaker Execution

Follow the AWS documentation to create an AWS Identity & Access Management (IAM) role that will be used to run SageMaker Model Endpoints. You only need to create a single SageMaker execution role, and can reuse this IAM Role to deploy multiple SageMaker Endpoints.

Deploy Deepgram Model Package for SageMaker AI

Once you’ve subscribed to the Deepgram product on AWS Marketplace, you can deploy a SageMaker AI Endpoint. The SageMaker “Endpoint” resource represents the compute instance that runs the Deepgram Voice AI services. It will take several minutes to deploy a SageMaker Endpoint, once you initiate the resource creation.

1

In the AWS Management Console, navigate to the SageMaker AI console

2

In the left-hand menu, under the AWS Marketplace Resources heading, select Marketplace Model Packages

3

Click on the AWS Marketplace Subscriptions tab

4

Select the product link, and then select the radio button next to the product version in the list

5

Select Actions ➡️ Create Endpoint

6

Provide a name for the model (eg. deepgram-streaming-stt)

7

Under IAM Role, select the SageMaker execution role that you created

8

Click the Next button

9

Provide an Endpoint Name, such as my-deepgram-streaming-stt

10

Under Variants ➡️ Production, scroll all the way to the right, and click Edit

11

If desired, select Choose Other Instance Type and select the instance type you want to deploy to (eg. g5.2xlarge), then click Save

12

Click the Create Endpoint Configuration button

13

Click the Submit button, to create the SageMaker AI Endpoint

After following these steps, you should see a new Endpoint in your AWS account. If you don’t see the Endpoint, ensure that you have selected the correct AWS region in the AWS Management Console. It may take several minutes for the Endpoint to change to status InService. Once the Endpoint status has changed to InService, you can monitor the Amazon CloudWatch Logs for the Endpoint to ensure normal operation of the Deepgram services.

Inference

Speech-to-Text (STT) Streaming

Once you’ve deployed the Deepgram services as a SageMaker Endpoint, you can run streaming inference against the Endpoint using a supported AWS Software Development Kit (SDK). This capability requires the InvokeEndpointWithBidirectionalStream API in the Amazon SageMaker AI service.

The Deepgram WebSocket payloads do not change in SageMaker, however they will be wrapped in an additional data structure required by the Amazon SageMaker API. For example, to send an array of bytes as an audio payload to the Deepgram STT streaming API, a WebSocket KeepAlive, or CloseStream message, you would use the following payloads.

SageMaker Bidirectional Payload Examples
1// Deepgram STT streaming on Amazon SageMaker audio chunk
2PayloadPart: {
3 Bytes: [0, 255, 128, 64, 250, 5],
4 DataType: "BINARY",
5},
6
7// Deepgram STT streaming on Amazon SageMaker "KeepAlive" message
8PayloadPart: {
9 Bytes: new TextEncoder().encode(JSON.stringify({
10 type: "KeepAlive",
11 })),
12 DataType: "UTF8",
13},
14
15// Deepgram STT streaming on Amazon SageMaker "CloseStream" message
16PayloadPart: {
17 Bytes: new TextEncoder().encode(JSON.stringify({
18 type: "CloseStream",
19 })),
20 DataType: "UTF8",
21},

Let’s walk through the code to run inference against the Deepgram Speech-to-Text (STT) streaming transcription endpoint in SageMaker AI. First, import the necessary items from the Amazon SageMaker AI runtime API.

TypeScript
1 import {
2 SageMakerRuntimeClient,
3 RequestEventStream,
4 InvokeEndpointWithBidirectionalStreamCommand,
5 InvokeEndpointWithBidirectionalStreamCommandInput,
6 InvokeEndpointWithBidirectionalStreamCommandOutput,
7 } from '@aws-sdk/client-sagemaker-runtime-http2';

Next, construct the input arguments for the SageMaker AI bidirectional streaming API.

TypeScript
1 const invokeParams: InvokeEndpointWithBidirectionalStreamCommandInput = {
2 EndpointName: `YOUR_SAGEMAKER_ENDPOINT_NAME`, // Specify the SageMaker Endpoint name you've deployed
3 ModelInvocationPath: 'v1/listen', // The Deepgram STT WebSocket streaming API URL path
4 ModelQueryString: 'model=nova-3&diarize=false&language=multi', // Specify the Deepgram STT query string parameters you'd like to use
5 Body: createWebSocketStream(inputFilePath), // async generator function that returns WebSocket messages, as AsyncIterable<RequestEventStream>
6 };

Next, create the SageMaker AI streaming client (JavaScript SDK) and command, and invoke it.

TypeScript
1 const smClient = new SageMakerRuntimeClient({
2 // Specify the correct AWS region for your SageMaker Endpoint resource
3 region: 'us-east-2',
4 // This is the Amazon SageMaker API Endpoint URL, not your SageMaker Endpoint resource
5 endpoint: 'https://runtime.sagemaker.us-east-2.amazonaws.com:8443'
6 });
7 const command = new InvokeEndpointWithBidirectionalStreamCommand(invokeParams);
8 const response: InvokeEndpointWithBidirectionalStreamCommandOutput = await smClient.send(command);

After invoking the streaming connection, process each response message received from the streaming API.

TypeScript
1 for await (const chunk of response.Body) {
2 if (chunk.PayloadPart && chunk.PayloadPart.Bytes) {
3 const chunkData = new TextDecoder().decode(chunk.PayloadPart.Bytes);
4 console.log('Bidirectional chunk data:', chunkData);
5 }
6 }

You’ll also need to implement the createWebSocketStream function, which will be a generator function that will continuously yield WebSocket client messages to the Deepgram API. This function could read a live audio stream from a microphone, using the @mastra/node-audio package, or stream a local WAV file, in chunks.

TypeScript
1async function* createWebSocketStream() {
2 // Read microphone or WAV file in chunks and continuously yield
3 while (true) {
4 yield {
5 PayloadPart: {
6 // The raw audio bytes should be provided as a Uint8Array
7 Bytes: new Uint8Array([0,1,2,3,4,5]),
8 DataType: "BINARY",
9 },
10 }
11 }
12}

Here is a complete example:

TypeScript Complete Example
1 // Sample script to capture microphone input and stream to Amazon SageMaker bidirectional
2 // streaming endpoint with Deepgram transcription Voice AI models.
3 import {
4 SageMakerRuntimeClient,
5 InvokeEndpointWithBidirectionalStreamCommand,
6 InvokeEndpointWithBidirectionalStreamCommandOutput,
7 RequestEventStream,
8 InvokeEndpointWithBidirectionalStreamCommandInput
9 } from '@aws-sdk/client-sagemaker-runtime-http2';
10 import { getMicrophoneStream } from '@mastra/node-audio';
11 import { Readable } from 'stream';
12
13 // Configuration interface
14 interface Config {
15 endpointName: string;
16 }
17
18 // Script Configuration
19
20 // CHANGE ME: Specify the region where your SageMake Endpoint is deployed.
21 const region: string = "YOUR_AWS_REGION";
22 // CHANGE ME: This must correspond to the AWS region where your Endpoint is deployed.
23 const bidiEndpoint: string = `https://runtime.sagemaker.${region}.amazonaws.com:8443`;
24 // The internal WebSocket API route you want to access, used by Deepgram specifically.
25 const modelInvocationPath = 'v1/listen';
26 // CHANGE ME: Update this to the model parameters you want. Preview only supports nova-3, entity detection, and diarization.
27 const modelQueryString = 'model=nova-3&language=multi';
28
29 // CHANGE ME: Update this value to the name of the SageMaker Endpoint you deploy
30 const config: Config = {
31 endpointName: `YOUR_ENDPOINT_NAME`,
32 };
33
34 const sagemakerRuntimeClientForBidi = new SageMakerRuntimeClient({
35 region: region,
36 endpoint: bidiEndpoint
37 });
38
39 const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms)).then(() => console.log("Sleeping completed!"));
40
41 // Generator function that yields audio chunks from the local microphone
42 async function* audioStream(chunkSize: number = 1024*128) : AsyncIterable<RequestEventStream> {
43 const KEEPALIVE_INTERVAL = 3000; // 3 seconds
44 let lastKeepaliveTime = Date.now();
45
46 let microphone = getMicrophoneStream();
47
48 let streamActive = true;
49 // Set up keepalive interval
50 const keepaliveInterval = setInterval(() => {
51 if (streamActive && Date.now() - lastKeepaliveTime >= KEEPALIVE_INTERVAL) {
52 console.log('Sending keepalive message...');
53 lastKeepaliveTime = Date.now();
54 }
55 }, KEEPALIVE_INTERVAL);
56
57 try {
58 for await (const chunk of microphone) {
59 yield {
60 PayloadPart: {
61 Bytes: new Uint8Array(chunk),
62 DataType: "BINARY",
63 },
64 };
65 }
66 console.log('Audio streaming complete. Continuing to send keepalive messages...');
67 const keepaliveEndTime = Date.now() + 120000; // Keep alive for 40 seconds
68
69 while (Date.now() < keepaliveEndTime) {
70 const now = Date.now();
71 if (now - lastKeepaliveTime >= KEEPALIVE_INTERVAL) {
72 const timestamp = new Date(now).toISOString();
73 console.log(`Sending post-stream keepalive message at ${timestamp}...`);
74 yield {
75 PayloadPart: {
76 Bytes: new TextEncoder().encode(JSON.stringify({
77 type: "KeepAlive",
78 })),
79 DataType: "UTF8",
80 },
81 };
82 lastKeepaliveTime = now;
83 }
84 // Small sleep to prevent tight loop
85 await sleep(3000);
86 }
87
88 // Close the connection after receiving all the response messages
89 yield {
90 PayloadPart: {
91 Bytes: new TextEncoder().encode(JSON.stringify({
92 type: "CloseStream",
93 })),
94 DataType: "UTF8",
95 },
96 };
97
98 console.log('Keepalive period completed.');
99 } finally {
100 streamActive = false;
101 clearInterval(keepaliveInterval);
102 }
103 }
104
105 // Invokes the Amazon SageMaker bidirectional stream API and processes response payloads
106 async function invokeEndpointWithBidirectionalStream(): Promise<InvokeEndpointWithBidirectionalStreamCommandOutput> {
107 console.log('Invoking endpoint with bidirectional stream...');
108
109 const invokeParams: InvokeEndpointWithBidirectionalStreamCommandInput = {
110 EndpointName: config.endpointName,
111 // Body: createRequestStream(), // as AsyncIterable<RequestEventStream>
112 Body: audioStream(), // as AsyncIterable<RequestEventStream>
113 ModelInvocationPath: modelInvocationPath,
114 ModelQueryString: modelQueryString,
115 };
116
117 try {
118 console.log('Using custom bidi endpoint:', bidiEndpoint);
119 const command = new InvokeEndpointWithBidirectionalStreamCommand(invokeParams);
120 console.log('Sending bidirectional stream request...');
121 const response: InvokeEndpointWithBidirectionalStreamCommandOutput = await sagemakerRuntimeClientForBidi.send(command);
122
123 console.log('Bidirectional stream response received. Processing...');
124 console.log('Response metadata:', response.$metadata);
125
126 if (response.Body) {
127 let chunkCount = 0;
128 const timeout = setTimeout(() => {
129 console.log('Timeout waiting for bidirectional stream chunks');
130 }, 20000); // 10 second timeout
131
132 try {
133 // Read responses from the bidirectional stream
134 for await (const chunk of response.Body) {
135 chunkCount++;
136 console.log(`Processing bidirectional chunk ${chunkCount}:`, Object.keys(chunk));
137
138 if (chunk.PayloadPart && chunk.PayloadPart.Bytes) {
139 const chunkData = new TextDecoder().decode(chunk.PayloadPart.Bytes);
140 console.log('Bidirectional chunk data:', chunkData);
141 // console.log('Bidirectional chunk:', chunk.PayloadPart);
142 }
143
144 if (chunk.InternalStreamFailure) {
145 console.error('Bidirectional internal stream failure:', chunk.InternalStreamFailure);
146 break;
147 }
148
149 if (chunk.ModelStreamError) {
150 console.error('Bidirectional model stream error:', chunk.ModelStreamError);
151 break;
152 }
153 }
154 clearTimeout(timeout);
155 console.log(`Processed ${chunkCount} bidirectional chunks total`);
156 } catch (streamError) {
157 clearTimeout(timeout);
158 console.error('Error processing bidirectional stream:', streamError);
159 throw streamError;
160 }
161 } else {
162 console.log('No bidirectional response body received');
163 }
164
165 console.log('Bidirectional endpoint invocation completed successfully');
166 return response;
167 } catch (error: any) {
168 console.error('Error invoking endpoint with bidirectional stream:', error);
169 console.error('Error details:', {
170 name: error.name,
171 message: error.message,
172 statusCode: error.$metadata?.httpStatusCode
173 });
174 throw error;
175 }
176 }
177
178 // Main execution function
179 async function main(): Promise<void> {
180 try {
181 console.log('Starting SageMaker deployment process...');
182
183 await invokeEndpointWithBidirectionalStream();
184
185 console.log('All operations completed successfully!');
186
187 } catch (error) {
188 console.error('Deployment process failed:', error);
189 throw error;
190 }
191 }
192
193 declare const require: any;
194 declare const module: any;
195 declare const process: any;
196
197 if (typeof require !== 'undefined' && require.main === module) {
198 main().catch(error => {
199 console.error('Script execution failed:', error);
200 if (typeof process !== 'undefined') {
201 process.exit(1);
202 }
203 });
204 }
205
206 export {
207 invokeEndpointWithBidirectionalStream,
208 config,
209 bidiEndpoint
210 };

Troubleshooting

If you’re experiencing any issues with your Deepgram deployment on Amazon SageMaker AI, you can obtain the Deepgram container logs from the Amazon CloudWatch service. If you open the SageMaker AI Endpoint resource details, there will be a link to open the Amazon CloudWatch Log Group for that endpoint. Within the CloudWatch Log Group, there should be a Log Stream that contains the Deepgram logs for all components. You can use the Amazon CloudWatch Logs Live Tail feature to watch logs in near-real-time while you are sending requests to the Deepgram API, via the SageMaker AI APIs.

To use the CloudWatch Logs Live Tail feature locally, from the AWS CLI tool, you can use the following command.

aws logs tail --follow /aws/sagemaker/Endpoints/YOUR_SAGEMAKER_ENDPOINT_NAME --region YOUR_AWS_REGION

Checklist

If you experience any issues using Deepgram services running on the Amazon SageMaker AI platform, please review this checklist before contacting Deepgram support.

  • Ensure that your application’s AWS IAM User or IAM Role has permission to call the InvokeEndpointWithBidirectionalStream SageMaker AI action.
  • Ensure your application is targeting the correct AWS account and region, where your SageMaker Endpoint exists.
  • Ensure the Deepgram product you’ve deployed (eg. streaming Speech-to-Text), from the AWS Marketplace, corresponds to the Deepgram API you’re calling.