Update an Amazon SageMaker Endpoint

Roll out a newer Deepgram Model Package or a different model version on a SageMaker Endpoint that is already serving production traffic.

When you ship a newer Deepgram model version, switch language models, or apply a new environment variable configuration, you need to update an Amazon SageMaker Endpoint that is already serving production traffic. Amazon SageMaker AI handles this through the UpdateEndpoint API, which replaces the running fleet with a new fleet without taking the Endpoint offline.

Because Deepgram is distributed through the AWS Marketplace, SageMaker performs an all-at-once update: it provisions a new fleet, shifts 100% of the traffic to it in a single step, and terminates the old fleet. This is the only deployment strategy available to AWS Marketplace containers — see Other deployment guardrails below.

When to update an endpoint

Common reasons to update a Deepgram SageMaker Endpoint include:

  • Promoting a newer Deepgram Model Package version published to the AWS Marketplace.
  • Switching the underlying instance type to scale capacity or reduce cost.
  • Changing Deepgram environment variables to tune api.toml or engine.toml settings (for example, flux.max_streams or max_active_requests).
  • Migrating between Deepgram language models or product listings.

Each of these changes is applied by creating a new SageMaker Model and/or Endpoint Configuration and then calling UpdateEndpoint with the new Endpoint Configuration. If only the Endpoint Configuration changes — for example, a different instance type — you can reuse the existing Model.

How updates work for Deepgram

When you call UpdateEndpoint against a Deepgram Endpoint, SageMaker:

  1. Provisions a complete new fleet (the green fleet) using the new Endpoint Configuration, while the existing fleet (the blue fleet) continues to serve traffic.
  2. Once the green fleet is healthy, shifts all traffic from the blue fleet to the green fleet in a single step.
  3. Terminates the blue fleet.

Because the update is all-at-once, there is no incremental traffic shift, no baking period to evaluate the new fleet on a fraction of traffic before commitment, and no automatic rollback based on CloudWatch alarms. The Endpoint stays in service for the entire procedure — clients continue to be served by the blue fleet until SageMaker performs the cutover.

Update an endpoint

1

If the underlying Model needs to change — for example, a new Marketplace Model Package version, a different ECR image, or new environment variables — create a new SageMaker Model. See the AWS CLI and Boto3 examples in Configure Amazon SageMaker Deployments for how to create a Model. If the Model can be reused, skip this step.

2

Create a new Endpoint Configuration that references the Model and uses the same variant name as the existing Endpoint Configuration. The variant name (typically AllTraffic) must match for SageMaker to update the Endpoint in place.

3

Call UpdateEndpoint with the existing Endpoint name and the new Endpoint Configuration name.

AWS CLI
$aws sagemaker update-endpoint \
> --endpoint-name my-deepgram-streaming-stt \
> --endpoint-config-name my-deepgram-streaming-stt-config-v2
Boto3
1import boto3
2
3sagemaker = boto3.client("sagemaker")
4
5sagemaker.update_endpoint(
6 EndpointName="my-deepgram-streaming-stt",
7 EndpointConfigName="my-deepgram-streaming-stt-config-v2",
8)

After you initiate the update, monitor progress in the SageMaker AI console under Endpoints > your endpoint, or by polling the DescribeEndpoint API. The Endpoint’s EndpointStatus transitions through Updating and back to InService once the cutover completes.

Considerations for Deepgram workloads

  • Variant name must match. When you create the new Endpoint Configuration, use the same variant name (typically AllTraffic) as the existing configuration. SageMaker requires this for in-place updates.
  • Streaming connections are long-lived. Deepgram streaming Speech-to-Text Endpoints hold bidirectional WebSocket connections for up to 30 minutes. Existing connections are not migrated to the green fleet — they continue on the original instances until they close. Plan updates to coincide with periods of lower streaming traffic if you want to minimize disruption.
  • GPU capacity. During the cutover, SageMaker briefly runs both the blue and green fleets in parallel. Confirm that your AWS account has sufficient GPU instance quota in the target region to accommodate the full additional fleet before starting the update.
  • Monitor the rollout. Because the Marketplace update path does not include automated alarm-based rollback, watch your application’s error tracking and the Amazon CloudWatch metrics for the Endpoint after the cutover (for example, ModelLatency and any application-level error metrics you emit). If you observe regressions, roll back manually by calling UpdateEndpoint again with the previous Endpoint Configuration.

Other deployment guardrails

Amazon SageMaker AI supports additional deployment guardrails — including blue/green deployments with canary and linear traffic shifting, and rolling deployments — that provide gradual traffic shifting, baking periods, and CloudWatch-based auto-rollback. These guardrails are not available for AWS Marketplace containers like Deepgram. Per the deployment guardrails exclusions, Endpoints that use Marketplace containers fall back to a blue/green deployment with all-at-once traffic shifting and no final baking period.


What’s Next