Deploy Deepgram on Amazon SageMaker
Deepgram can be deployed into your own Amazon Virtual Private Cloud (VPC) environment using Amazon SageMaker AI. Simply subscribe to the Deepgram product in the AWS Marketplace and then deploy a SageMaker Endpoint, using our pre-made SageMaker Model Package.
Amazon SageMaker is a managed cloud platform from Amazon Web Services (AWS) that enables deployment of Deepgram as a managed, container-based service. Once you deploy Deepgram as a SageMaker Model Endpoint, you can run inference against the service using the Amazon SageMaker AI Software Development Kit (SDK).
Supported Products
The following Deepgram products are supported on the SageMaker AI platform.
Each transcription language model is published as a separate product listing. Please subscribe to and deploy SageMaker Endpoints for each language model that you wish to utilize. Your application code will need to route to your SageMaker Endpoint for the language model you wish to run inference against.
Limitations
When using Deepgram services in Amazon SageMaker, please be aware of the following limitations.
- Deepgram cannot call Large Langage Model (LLM) services
- Deepgram cannot invoke user-defined callback URLs
Prerequisites
- An AWS account
- AWS IAM permissions to SageMaker and Marketplace
- IAM Policy: AWSMarketplaceManageSubscriptions
- IAM Policy: AmazonSageMakerFullAccess
Subscribe to Deepgram Products
Before you can deploy Deepgram on Amazon SageMaker AI, you’ll need to subscribe to the product in the AWS Marketplace. Keep in mind that you are not billed for the product until you deploy an Amazon SageMaker AI Endpoint resource.
Login to the AWS Management Console for the account you’d like to deploy in
Search for and navigate to the AWS Marketplace console
Create AWS IAM Role for SageMaker Execution
Follow the AWS documentation to create an AWS Identity & Access Management (IAM) role that will be used to run SageMaker Model Endpoints. You only need to create a single SageMaker execution role, and can reuse this IAM Role to deploy multiple SageMaker Endpoints.
Deploy Deepgram Model Package for SageMaker AI
Once you’ve subscribed to the Deepgram product on AWS Marketplace, you can deploy a SageMaker AI Endpoint. The SageMaker “Endpoint” resource represents the compute instance that runs the Deepgram Voice AI services. It will take several minutes to deploy a SageMaker Endpoint, once you initiate the resource creation.
In the AWS Management Console, navigate to the SageMaker AI console
In the left-hand menu, under the AWS Marketplace Resources heading, select Marketplace Model Packages
Under IAM Role, select the SageMaker execution role that you created
After following these steps, you should see a new Endpoint in your AWS account.
If you don’t see the Endpoint, ensure that you have selected the correct AWS region in the AWS Management Console.
It may take several minutes for the Endpoint to change to status InService.
Once the Endpoint status has changed to InService, you can monitor the Amazon CloudWatch Logs for the Endpoint to ensure normal operation of the Deepgram services.
Inference
Speech-to-Text (STT) Streaming
Once you’ve deployed the Deepgram services as a SageMaker Endpoint, you can run streaming inference against the Endpoint using a supported AWS Software Development Kit (SDK).
This capability requires the InvokeEndpointWithBidirectionalStream API in the Amazon SageMaker AI service.
The Deepgram WebSocket payloads do not change in SageMaker, however they will be wrapped in an additional data structure required by the Amazon SageMaker API.
For example, to send an array of bytes as an audio payload to the Deepgram STT streaming API, a WebSocket KeepAlive, or CloseStream message, you would use the following payloads.
Let’s walk through the code to run inference against the Deepgram Speech-to-Text (STT) streaming transcription endpoint in SageMaker AI. First, import the necessary items from the Amazon SageMaker AI runtime API.
Next, construct the input arguments for the SageMaker AI bidirectional streaming API.
Next, create the SageMaker AI streaming client (JavaScript SDK) and command, and invoke it.
After invoking the streaming connection, process each response message received from the streaming API.
You’ll also need to implement the createWebSocketStream function, which will be a generator function that will continuously yield WebSocket client messages to the Deepgram API.
This function could read a live audio stream from a microphone, using the @mastra/node-audio package, or stream a local WAV file, in chunks.
Here is a complete example:
Troubleshooting
If you’re experiencing any issues with your Deepgram deployment on Amazon SageMaker AI, you can obtain the Deepgram container logs from the Amazon CloudWatch service. If you open the SageMaker AI Endpoint resource details, there will be a link to open the Amazon CloudWatch Log Group for that endpoint. Within the CloudWatch Log Group, there should be a Log Stream that contains the Deepgram logs for all components. You can use the Amazon CloudWatch Logs Live Tail feature to watch logs in near-real-time while you are sending requests to the Deepgram API, via the SageMaker AI APIs.
To use the CloudWatch Logs Live Tail feature locally, from the AWS CLI tool, you can use the following command.
Checklist
If you experience any issues using Deepgram services running on the Amazon SageMaker AI platform, please review this checklist before contacting Deepgram support.
- Ensure that your application’s AWS IAM User or IAM Role has permission to call the
InvokeEndpointWithBidirectionalStreamSageMaker AI action. - Ensure your application is targeting the correct AWS account and region, where your SageMaker Endpoint exists.
- Ensure the Deepgram product you’ve deployed (eg. streaming Speech-to-Text), from the AWS Marketplace, corresponds to the Deepgram API you’re calling.