Deploy Deepgram on Amazon SageMaker

Amazon SageMaker is a managed cloud platform from Amazon Web Services (AWS) that enables deployment of Deepgram as a managed, container-based service. Once you deploy Deepgram as a SageMaker Model Endpoint, you can run inference against the service using the Amazon SageMaker AI Software Development Kit (SDK).

Supported Products

The following Deepgram products are supported on the SageMaker AI platform.

Speech-to-Text (STT) transcription, streaming (WebSocket)

Each transcription language model is published as a separate product listing. Please subscribe to and deploy SageMaker Endpoints for each language model that you wish to utilize. Your application code will need to route to your SageMaker Endpoint for the language model you wish to run inference against.

Limitations

When using Deepgram services in Amazon SageMaker, please be aware of the following limitations.

Deepgram cannot call Large Langage Model (LLM) services
Deepgram cannot invoke user-defined callback URLs

Prerequisites

An AWS account
AWS IAM permissions to SageMaker and Marketplace
- IAM Policy: AWSMarketplaceManageSubscriptions
- IAM Policy: AmazonSageMakerFullAccess

Before you can deploy Deepgram on Amazon SageMaker AI, you’ll need to subscribe to the product in the AWS Marketplace. Keep in mind that you are not billed for the product until you deploy an Amazon SageMaker AI Endpoint resource.

Search for and navigate to the AWS Marketplace console

Click on the Discover Products link on the left

Under Search AWS Marketplace products, search for “Deepgram”

Under Refine Results, Delivery Methods, check the SageMaker Model box

Click on the Deepgram product you’re interested in deploying (eg. Deepgram Voice AI- English Speech-to-Text (STT))

Click on the View Purchase Options button

Ensure the Offer Type of Public Offer is selected (if required)

Scroll down and click the Subscribe button

Create AWS IAM Role for SageMaker Execution

Follow the AWS documentation to create an AWS Identity & Access Management (IAM) role that will be used to run SageMaker Model Endpoints. You only need to create a single SageMaker execution role, and can reuse this IAM Role to deploy multiple SageMaker Endpoints.

Deploy Deepgram Model Package for SageMaker AI

Once you’ve subscribed to the Deepgram product on AWS Marketplace, you can deploy a SageMaker AI Endpoint. The SageMaker “Endpoint” resource represents the compute instance that runs the Deepgram Voice AI services. It will take several minutes to deploy a SageMaker Endpoint, once you initiate the resource creation.

In the AWS Management Console, navigate to the SageMaker AI console

In the left-hand menu, under the AWS Marketplace Resources heading, select Marketplace Model Packages

Click on the AWS Marketplace Subscriptions tab

Select the product link, and then select the radio button next to the product version in the list

Select Actions ➡️ Create Endpoint

Provide a name for the model (eg. deepgram-streaming-stt)

Under IAM Role, select the SageMaker execution role that you created

Click the Next button

Provide an Endpoint Name, such as my-deepgram-streaming-stt

Under Variants ➡️ Production, scroll all the way to the right, and click Edit

If desired, select Choose Other Instance Type and select the instance type you want to deploy to (eg. g5.2xlarge), then click Save

Click the Create Endpoint Configuration button

Click the Submit button, to create the SageMaker AI Endpoint

After following these steps, you should see a new Endpoint in your AWS account. If you don’t see the Endpoint, ensure that you have selected the correct AWS region in the AWS Management Console. It may take several minutes for the Endpoint to change to status InService. Once the Endpoint status has changed to InService, you can monitor the Amazon CloudWatch Logs for the Endpoint to ensure normal operation of the Deepgram services.

Inference

Speech-to-Text (STT) Streaming

Once you’ve deployed the Deepgram services as a SageMaker Endpoint, you can run streaming inference against the Endpoint using a supported AWS Software Development Kit (SDK). This capability requires the InvokeEndpointWithBidirectionalStream API in the Amazon SageMaker AI service.

The Deepgram WebSocket payloads do not change in SageMaker, however they will be wrapped in an additional data structure required by the Amazon SageMaker API. For example, to send an array of bytes as an audio payload to the Deepgram STT streaming API, a WebSocket KeepAlive, or CloseStream message, you would use the following payloads.

SageMaker Bidirectional Payload Examples

1 // Deepgram STT streaming on Amazon SageMaker audio chunk
2 PayloadPart: {
3     Bytes: [0, 255, 128, 64, 250, 5],
4     DataType: "BINARY",
5 },
6 
7 // Deepgram STT streaming on Amazon SageMaker "KeepAlive" message
8 PayloadPart: {
9     Bytes: new TextEncoder().encode(JSON.stringify({
10         type: "KeepAlive",
11     })),
12     DataType: "UTF8",
13 },
14 
15 // Deepgram STT streaming on Amazon SageMaker "CloseStream" message
16 PayloadPart: {
17     Bytes: new TextEncoder().encode(JSON.stringify({
18         type: "CloseStream",
19     })),
20     DataType: "UTF8",
21 },

Let’s walk through the code to run inference against the Deepgram Speech-to-Text (STT) streaming transcription endpoint in SageMaker AI. First, import the necessary items from the Amazon SageMaker AI runtime API.

TypeScript

1   import {
2       SageMakerRuntimeClient,
3       RequestEventStream,
4       InvokeEndpointWithBidirectionalStreamCommand,
5       InvokeEndpointWithBidirectionalStreamCommandInput,
6       InvokeEndpointWithBidirectionalStreamCommandOutput,
7   } from '@aws-sdk/client-sagemaker-runtime-http2';

Next, construct the input arguments for the SageMaker AI bidirectional streaming API.

TypeScript

1   const invokeParams: InvokeEndpointWithBidirectionalStreamCommandInput = {
2       EndpointName: `YOUR_SAGEMAKER_ENDPOINT_NAME`,                   // Specify the SageMaker Endpoint name you've deployed
3       ModelInvocationPath: 'v1/listen',                               // The Deepgram STT WebSocket streaming API URL path
4       ModelQueryString: 'model=nova-3&diarize=false&language=multi',  // Specify the Deepgram STT query string parameters you'd like to use
5       Body: createWebSocketStream(inputFilePath),                // async generator function that returns WebSocket messages, as AsyncIterable<RequestEventStream>
6   };

Next, create the SageMaker AI streaming client (JavaScript SDK) and command, and invoke it.

TypeScript

1   const smClient = new SageMakerRuntimeClient({
2       // Specify the correct AWS region for your SageMaker Endpoint resource
3       region: 'us-east-2',
4       // This is the Amazon SageMaker API Endpoint URL, not your SageMaker Endpoint resource
5       endpoint: 'https://runtime.sagemaker.us-east-2.amazonaws.com:8443'
6   });
7   const command = new InvokeEndpointWithBidirectionalStreamCommand(invokeParams);
8   const response: InvokeEndpointWithBidirectionalStreamCommandOutput = await smClient.send(command);

After invoking the streaming connection, process each response message received from the streaming API.

TypeScript

1   for await (const chunk of response.Body) {
2       if (chunk.PayloadPart && chunk.PayloadPart.Bytes) {
3           const chunkData = new TextDecoder().decode(chunk.PayloadPart.Bytes);
4           console.log('Bidirectional chunk data:', chunkData);
5       }
6   }

You’ll also need to implement the createWebSocketStream function, which will be a generator function that will continuously yield WebSocket client messages to the Deepgram API. This function could read a live audio stream from a microphone, using the @mastra/node-audio package, or stream a local WAV file, in chunks.

TypeScript

1 async function* createWebSocketStream() {
2   // Read microphone or WAV file in chunks and continuously yield
3   while (true) {
4     yield {
5       PayloadPart: {
6           // The raw audio bytes should be provided as a Uint8Array
7           Bytes: new Uint8Array([0,1,2,3,4,5]),
8           DataType: "BINARY",
9       },
10     }
11   }
12 }

Here is a complete example:

TypeScript Complete Example

1   // Sample script to capture microphone input and stream to Amazon SageMaker bidirectional 
2   // streaming endpoint with Deepgram transcription Voice AI models.
3   import {
4       SageMakerRuntimeClient,
5       InvokeEndpointWithBidirectionalStreamCommand,
6       InvokeEndpointWithBidirectionalStreamCommandOutput,
7       RequestEventStream,
8       InvokeEndpointWithBidirectionalStreamCommandInput
9   } from '@aws-sdk/client-sagemaker-runtime-http2';
10   import { getMicrophoneStream } from '@mastra/node-audio';
11   import { Readable } from 'stream';
12 
13   // Configuration interface
14   interface Config {
15       endpointName: string;
16   }
17 
18   // Script Configuration
19 
20   // CHANGE ME: Specify the region where your SageMake Endpoint is deployed.
21   const region: string = "YOUR_AWS_REGION";
22   // CHANGE ME: This must correspond to the AWS region where your Endpoint is deployed.
23   const bidiEndpoint: string = `https://runtime.sagemaker.${region}.amazonaws.com:8443`;
24   // The internal WebSocket API route you want to access, used by Deepgram specifically.
25   const modelInvocationPath = 'v1/listen';
26   // CHANGE ME: Update this to the model parameters you want. Preview only supports nova-3, entity detection, and diarization.
27   const modelQueryString = 'model=nova-3&language=multi';
28 
29   // CHANGE ME: Update this value to the name of the SageMaker Endpoint you deploy
30   const config: Config = {
31       endpointName: `YOUR_ENDPOINT_NAME`,
32   };
33 
34   const sagemakerRuntimeClientForBidi = new SageMakerRuntimeClient({
35       region: region,
36       endpoint: bidiEndpoint
37   });
38 
39   const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms)).then(() => console.log("Sleeping completed!"));
40 
41   // Generator function that yields audio chunks from the local microphone
42   async function* audioStream(chunkSize: number = 1024*128) : AsyncIterable<RequestEventStream> {
43       const KEEPALIVE_INTERVAL = 3000; // 3 seconds
44       let lastKeepaliveTime = Date.now();
45 
46       let microphone = getMicrophoneStream();
47 
48       let streamActive = true;
49       // Set up keepalive interval
50       const keepaliveInterval = setInterval(() => {
51           if (streamActive && Date.now() - lastKeepaliveTime >= KEEPALIVE_INTERVAL) {
52               console.log('Sending keepalive message...');
53               lastKeepaliveTime = Date.now();
54           }
55       }, KEEPALIVE_INTERVAL);
56       
57       try {
58           for await (const chunk of microphone) {
59               yield {
60                   PayloadPart: {
61                       Bytes: new Uint8Array(chunk),
62                       DataType: "BINARY",
63                   },
64               };
65           }
66           console.log('Audio streaming complete. Continuing to send keepalive messages...');
67           const keepaliveEndTime = Date.now() + 120000; // Keep alive for 40 seconds
68           
69           while (Date.now() < keepaliveEndTime) {
70               const now = Date.now();
71               if (now - lastKeepaliveTime >= KEEPALIVE_INTERVAL) {
72                   const timestamp = new Date(now).toISOString();
73                   console.log(`Sending post-stream keepalive message at ${timestamp}...`);
74                   yield {
75                       PayloadPart: {
76                           Bytes: new TextEncoder().encode(JSON.stringify({
77                               type: "KeepAlive",
78                           })),
79                           DataType: "UTF8",
80                       },
81                   };
82                   lastKeepaliveTime = now;
83               }
84               // Small sleep to prevent tight loop
85               await sleep(3000);
86           }
87 
88           // Close the connection after receiving all the response messages
89           yield {
90               PayloadPart: {
91                   Bytes: new TextEncoder().encode(JSON.stringify({
92                       type: "CloseStream",
93                   })),
94                   DataType: "UTF8",
95               },
96           };
97           
98           console.log('Keepalive period completed.');
99       } finally {
100           streamActive = false;
101           clearInterval(keepaliveInterval);
102       }
103   }
104 
105   // Invokes the Amazon SageMaker bidirectional stream API and processes response payloads
106   async function invokeEndpointWithBidirectionalStream(): Promise<InvokeEndpointWithBidirectionalStreamCommandOutput> {
107       console.log('Invoking endpoint with bidirectional stream...');
108 
109       const invokeParams: InvokeEndpointWithBidirectionalStreamCommandInput = {
110           EndpointName: config.endpointName,
111           // Body: createRequestStream(), // as AsyncIterable<RequestEventStream>
112           Body: audioStream(), // as AsyncIterable<RequestEventStream>
113           ModelInvocationPath: modelInvocationPath,
114           ModelQueryString: modelQueryString,
115       };
116 
117       try {
118           console.log('Using custom bidi endpoint:', bidiEndpoint);
119           const command = new InvokeEndpointWithBidirectionalStreamCommand(invokeParams);
120           console.log('Sending bidirectional stream request...');
121           const response: InvokeEndpointWithBidirectionalStreamCommandOutput = await sagemakerRuntimeClientForBidi.send(command);
122 
123           console.log('Bidirectional stream response received. Processing...');
124           console.log('Response metadata:', response.$metadata);
125 
126           if (response.Body) {
127               let chunkCount = 0;
128               const timeout = setTimeout(() => {
129                   console.log('Timeout waiting for bidirectional stream chunks');
130               }, 20000); // 10 second timeout
131 
132               try {
133                   // Read responses from the bidirectional stream
134                   for await (const chunk of response.Body) {
135                       chunkCount++;
136                       console.log(`Processing bidirectional chunk ${chunkCount}:`, Object.keys(chunk));
137 
138                       if (chunk.PayloadPart && chunk.PayloadPart.Bytes) {
139                           const chunkData = new TextDecoder().decode(chunk.PayloadPart.Bytes);
140                           console.log('Bidirectional chunk data:', chunkData);
141                           // console.log('Bidirectional chunk:', chunk.PayloadPart);
142                       }
143 
144                       if (chunk.InternalStreamFailure) {
145                           console.error('Bidirectional internal stream failure:', chunk.InternalStreamFailure);
146                           break;
147                       }
148 
149                       if (chunk.ModelStreamError) {
150                           console.error('Bidirectional model stream error:', chunk.ModelStreamError);
151                           break;
152                       }
153                   }
154                   clearTimeout(timeout);
155                   console.log(`Processed ${chunkCount} bidirectional chunks total`);
156               } catch (streamError) {
157                   clearTimeout(timeout);
158                   console.error('Error processing bidirectional stream:', streamError);
159                   throw streamError;
160               }
161           } else {
162               console.log('No bidirectional response body received');
163           }
164 
165           console.log('Bidirectional endpoint invocation completed successfully');
166           return response;
167       } catch (error: any) {
168           console.error('Error invoking endpoint with bidirectional stream:', error);
169           console.error('Error details:', {
170               name: error.name,
171               message: error.message,
172               statusCode: error.$metadata?.httpStatusCode
173           });
174           throw error;
175       }
176   }
177 
178   // Main execution function
179   async function main(): Promise<void> {
180       try {
181           console.log('Starting SageMaker deployment process...');
182 
183           await invokeEndpointWithBidirectionalStream();
184 
185           console.log('All operations completed successfully!');
186 
187       } catch (error) {
188           console.error('Deployment process failed:', error);
189           throw error;
190       }
191   }
192 
193   declare const require: any;
194   declare const module: any;
195   declare const process: any;
196 
197   if (typeof require !== 'undefined' && require.main === module) {
198       main().catch(error => {
199           console.error('Script execution failed:', error);
200           if (typeof process !== 'undefined') {
201               process.exit(1);
202           }
203       });
204   }
205 
206   export {
207       invokeEndpointWithBidirectionalStream,
208       config,
209       bidiEndpoint
210   };

Troubleshooting

If you’re experiencing any issues with your Deepgram deployment on Amazon SageMaker AI, you can obtain the Deepgram container logs from the Amazon CloudWatch service. If you open the SageMaker AI Endpoint resource details, there will be a link to open the Amazon CloudWatch Log Group for that endpoint. Within the CloudWatch Log Group, there should be a Log Stream that contains the Deepgram logs for all components. You can use the Amazon CloudWatch Logs Live Tail feature to watch logs in near-real-time while you are sending requests to the Deepgram API, via the SageMaker AI APIs.

To use the CloudWatch Logs Live Tail feature locally, from the AWS CLI tool, you can use the following command.

aws logs tail --follow /aws/sagemaker/Endpoints/YOUR_SAGEMAKER_ENDPOINT_NAME --region YOUR_AWS_REGION

Checklist

If you experience any issues using Deepgram services running on the Amazon SageMaker AI platform, please review this checklist before contacting Deepgram support.

Ensure that your application’s AWS IAM User or IAM Role has permission to call the InvokeEndpointWithBidirectionalStream SageMaker AI action.
Ensure your application is targeting the correct AWS account and region, where your SageMaker Endpoint exists.
Ensure the Deepgram product you’ve deployed (eg. streaming Speech-to-Text), from the AWS Marketplace, corresponds to the Deepgram API you’re calling.

Supported Products

Limitations

Prerequisites

Subscribe to Deepgram Products

Create AWS IAM Role for SageMaker Execution

Deploy Deepgram Model Package for SageMaker AI

Inference

Speech-to-Text (STT) Streaming

Troubleshooting

Checklist