When you ship a newer Deepgram model version, switch language models, or apply a new environment variable configuration, you need to update an Amazon SageMaker Endpoint that is already serving production traffic. Amazon SageMaker AI handles this through the UpdateEndpoint API, which replaces the running fleet with a new fleet without taking the Endpoint offline.
Because Deepgram is distributed through the AWS Marketplace, SageMaker performs an all-at-once update: it provisions a new fleet, shifts 100% of the traffic to it in a single step, and terminates the old fleet. This is the only deployment strategy available to AWS Marketplace containers — see Other deployment guardrails below.
Common reasons to update a Deepgram SageMaker Endpoint include:
api.toml or engine.toml settings (for example, flux.max_streams or max_active_requests).Each of these changes is applied by creating a new SageMaker Model and/or Endpoint Configuration and then calling UpdateEndpoint with the new Endpoint Configuration. If only the Endpoint Configuration changes — for example, a different instance type — you can reuse the existing Model.
When you call UpdateEndpoint against a Deepgram Endpoint, SageMaker:
Because the update is all-at-once, there is no incremental traffic shift, no baking period to evaluate the new fleet on a fraction of traffic before commitment, and no automatic rollback based on CloudWatch alarms. The Endpoint stays in service for the entire procedure — clients continue to be served by the blue fleet until SageMaker performs the cutover.
If the underlying Model needs to change — for example, a new Marketplace Model Package version, a different ECR image, or new environment variables — create a new SageMaker Model. See the AWS CLI and Boto3 examples in Configure Amazon SageMaker Deployments for how to create a Model. If the Model can be reused, skip this step.
Create a new Endpoint Configuration that references the Model and uses the same variant name as the existing Endpoint Configuration. The variant name (typically AllTraffic) must match for SageMaker to update the Endpoint in place.
Call UpdateEndpoint with the existing Endpoint name and the new Endpoint Configuration name.
After you initiate the update, monitor progress in the SageMaker AI console under Endpoints > your endpoint, or by polling the DescribeEndpoint API. The Endpoint’s EndpointStatus transitions through Updating and back to InService once the cutover completes.
AllTraffic) as the existing configuration. SageMaker requires this for in-place updates.ModelLatency and any application-level error metrics you emit). If you observe regressions, roll back manually by calling UpdateEndpoint again with the previous Endpoint Configuration.Amazon SageMaker AI supports additional deployment guardrails — including blue/green deployments with canary and linear traffic shifting, and rolling deployments — that provide gradual traffic shifting, baking periods, and CloudWatch-based auto-rollback. These guardrails are not available for AWS Marketplace containers like Deepgram. Per the deployment guardrails exclusions, Endpoints that use Marketplace containers fall back to a blue/green deployment with all-at-once traffic shifting and no final baking period.
What’s Next