For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Introduction
    • Deployment Environments
  • Amazon SageMaker
      • Validate a Deepgram SageMaker Endpoint
      • Update an Amazon SageMaker Endpoint
      • Auto-Scaling
  • Docker/Podman
    • Drivers and Container Orchestration Tools
  • Kubernetes
    • Securing Your Cluster
    • Troubleshooting
  • Deployment
    • Self Service Licensing & Credentials
    • Deploy STT Services
    • Deploy Flux Model (STT)
    • Deploy TTS Services
    • Deploy Voice Agent
    • Status Endpoint
    • Certificate Status
  • Partner Deployment
  • Scaling and Deployment Strategies
    • System Maintenance
    • Blue-Green Deployment
    • Auto-Scaling
    • Metrics Guide
    • Ingress Authentication
    • Redact Usage
    • Log Formats
    • Using Private Container Registries
  • Features
    • Smart Formatting
  • Self-Hosted Add Ons
    • License Proxy
    • Prometheus Integration
    • Deepgram UniMRCP Plugin
    • Using SDKs with Self-Hosted
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • High-level process
  • Prerequisites
  • Pick the right script
  • Required tooling
  • AWS credentials and region
  • Run the script
  • Verify the result
  • Troubleshooting
Amazon SageMakerManage endpoints

Validate a Deepgram SageMaker Endpoint

Test a Deepgram SageMaker endpoint using the open-source client scripts in the dg-sagemaker repository, with one script per product and language.

Was this page helpful?
Previous

Update an Amazon SageMaker Endpoint

Roll out a newer Deepgram Model Package or a different model version on a SageMaker Endpoint that is already serving production traffic.
Next
Built with

Once your SageMaker endpoint reaches InService, run a client script against it to confirm that inference works end to end. Deepgram publishes a ready-made set of test clients in the dg-sagemaker repository. Each client wraps the SageMaker bidirectional streaming or batch (HTTP) invocation API with the payload shape that the target Deepgram model expects, so you can focus on verifying the endpoint rather than writing transport code.

The repository is organized by product and by language. Pick the script that matches the model you deployed (Flux, Nova-3, or Aura) and the language you want to work in (Python, Node.js, or Java). The scripts are load-testing clients first and functional smoke tests second — running any of them with a single connection is the fastest way to prove that your endpoint accepts traffic and returns results.

The scripts evolve independently. Flags, defaults, and input formats differ between products and between languages. Always read the README.md inside the subdirectory you intend to run before invoking a script.

High-level process

1

Install the required tooling

Install the language runtime and package manager for the script you plan to run. See Required tooling.

2

Configure AWS credentials and region

Configure AWS credentials locally and ensure your shell targets the region where the SageMaker endpoint is deployed. See AWS credentials and region.

3

Clone the dg-sagemaker repository

$git clone https://github.com/deepgram-devs/dg-sagemaker.git
$cd dg-sagemaker
4

Choose the script that matches your product

Identify the subdirectory for your Deepgram product and language. See Pick the right script.

5

Run the script against your endpoint

Invoke the script with your endpoint name, AWS region, and any product-specific inputs (for example, a WAV file for speech-to-text or a text file for text-to-speech).

Prerequisites

  • A Deepgram SageMaker endpoint in InService status. See Deploy Deepgram on Amazon SageMaker.
  • AWS IAM permissions to invoke the endpoint and, optionally, to list endpoints in the target region:
    • sagemaker:InvokeEndpoint
    • sagemaker:InvokeEndpointWithResponseStream
    • sagemaker:InvokeEndpointWithBidirectionalStream
    • sagemaker:ListEndpoints and sagemaker:DescribeEndpoint (used by list-endpoints helpers)
  • git available on your local machine.

Pick the right script

Each Deepgram product has a dedicated script because the SageMaker payload shape and the protocol on top of it differ per model. Run the script that matches the model you deployed.

ProductLanguagePath in dg-sagemakerScript
Flux (conversational STT)Pythonpython-flux/flux_stress.py file | microphone | list-endpoints
Nova-3 (streaming STT)Pythonpython-stt/stt_microphone_stress.py, stt_wav_stress.py stream | batch
Nova-3 (streaming STT)Node.jsjs-stt/stress-stt.ts (configured via stt.file.ts)
Nova-3 (streaming STT)Javajava/stt/aws-sdk/, java/stt/deepgram-sdk/Gradle projects; see each project’s README
Aura (text-to-speech)Pythonpython-tts/tts_stress.py

Invocation flags, defaults, and required inputs vary per script. Consult the README.md in each subdirectory for the full command reference. For example, the Flux client uses the /v2/listen endpoint and a turn-based protocol, while the Nova-3 client uses /v1/listen with channel-based alternatives.

Required tooling

Install only the tools you need for the script you plan to run.

Python
Node.js
Java
  • Python 3.12+ (Python 3.14+ for python-stt)
  • uv package manager
  • PortAudio for microphone or audio playback (required for python-tts and for microphone modes of python-stt/python-flux):
    • macOS: brew install portaudio
    • Linux: sudo apt-get install portaudio19-dev

Install dependencies inside the script’s subdirectory:

$cd python-stt # or python-flux, python-tts
$uv sync

AWS credentials and region

The scripts use the standard AWS credential chain — environment variables, shared credentials file, or an attached IAM role. Configure credentials with whichever mechanism you prefer:

$aws configure

Or export them for the current shell:

$export AWS_ACCESS_KEY_ID=...
$export AWS_SECRET_ACCESS_KEY=...
$export AWS_SESSION_TOKEN=... # if using temporary credentials

Verify the identity the scripts will use:

$aws sts get-caller-identity

Most scripts accept a --region flag and default to us-east-1 (or us-east-2 for python-tts). Make sure the region you pass matches the region where your SageMaker endpoint was deployed — an endpoint name is only resolvable within the region it was created in.

If you are not sure which endpoints exist in a region, use the Flux client’s helper subcommand to list them:

$cd python-flux
$uv run flux_stress.py list-endpoints --region us-east-1

Run the script

Once tooling and credentials are in place, invoke the script for your product. The example below shows the simplest single-connection invocation for the Flux Python client. The Nova-3 and Aura clients follow a similar pattern but with different flags, defaults, and input formats — refer to the dg-sagemaker repository and each subdirectory’s README.md for the full command reference.

$cd python-flux
$
$# Stream a WAV file at real-time pace
$uv run flux_stress.py file your-endpoint-name \
> --file audio.wav \
> --region us-east-1
$
$# Or stream live microphone audio
$uv run flux_stress.py microphone your-endpoint-name --region us-east-1

For best compatibility, use a 16-bit PCM WAV file. Convert other formats with:

$ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 audio.wav

Verify the result

A successful test produces:

  • Speech-to-text (Flux, Nova-3): transcript payloads streamed back to the client. Final transcripts and interim results (or Flux TurnInfo events) are logged to the console.
  • Text-to-speech (Aura): audible speech playback from the selected connection and a steady stream of synthesized audio chunks logged to the console.

If the script hangs, errors on connection, or fails to authenticate, see Troubleshooting.

Troubleshooting

  • ResourceNotFound or endpoint lookup failures — confirm the endpoint name and that your shell is targeting the correct region. Run aws sagemaker describe-endpoint --endpoint-name your-endpoint-name --region your-region to verify.
  • AccessDeniedException — confirm your IAM identity has sagemaker:InvokeEndpoint (and the bidirectional/streaming variants) for the endpoint’s ARN.
  • Endpoint not yet in service — wait until the endpoint status is InService. Deployment typically takes several minutes.
  • Audio format errors — try changing the input audio format to 16-bit PCM WAV, instead of other compressed formats. Use the ffmpeg command as shown above.
  • Server-side errors — check the Amazon CloudWatch Logs for the SageMaker endpoint to diagnose container-level issues.