NVIDIA LLM provider now available
NVIDIA is now a supported LLM provider for the Voice Agent API. Two models are available in the Standard pricing tier:
llama-nemotron-super-49B— Llama Nemotron Super 49B delivers high accuracy for multi-agentic reasoning.nemotron-3-nano-30B-A3B— Nemotron 3 Nano 30B A3B provides cost efficiency with high accuracy for targeted agentic tasks.
Set the provider type to nvidia in your agent configuration:
NVIDIA is a managed provider, so the endpoint field is optional. For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
Deepgram Self-Hosted April 2026 Release (260402)
Container Images (release 260402)
-
quay.io/deepgram/self-hosted-api:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.181.3
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260402-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.114.5
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260402- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
This Release Contains The Following Changes
- Certificate Endpoint Fix — Engine now responds to
/v1/certificatesin addition to/certificates, consistent with the other container images. See Certificate Status for details. - Model Name Consistency — The
/v1/modelsendpoint now returns acanonical_namefield matching the model name used in/v1/listenrequests. - General Improvements — Keeps our software up-to-date.
New thought_signature field for Gemini function calling
The Voice Agent API now includes an optional thought_signature field in function call messages. Some Gemini models (3.0 and 3.1 families) require this as an additional function call identifier.
This field appears in two places:
- Settings message — in
agent.context.messages[].function_calls[]when providing function call history - FunctionCallRequest — in
functions[]when the server requests a function call
Example
The thought_signature field is optional and only relevant when using Google Gemini models. This change addresses the degraded function calling performance that some users experienced with the Gemini 3.0 and 3.1 model families.
For more details, see the Function Call Request documentation, the Voice Agent API Reference, or Gemini’s Thought Signatures Documentation.
New volume parameter for Cartesia TTS
The Voice Agent API now supports an optional agent.speak.provider.volume parameter when using Cartesia as the TTS provider. Valid values range from 0.5 to 2.0.
For more details, see Configure the Voice Agent or the Cartesia volume, speed, and emotion documentation.
Nova-3 Model Update
🌏 Nova-3 now supports the following new languages and language codes:
- Chinese (Mandarin, Simplified):
zh,zh-CN,zh-Hans - Chinese (Mandarin, Traditional):
zh-TW,zh-Hant
Access these models by setting model="nova-3" and the relevant language code in your request.
Learn more about Nova-3 and supported languages on the Models and Language Overview page.
TTS speed controls & updated LLM models
TTS speak speed (Early Access)
You can now control the speaking rate of Deepgram TTS in the Voice Agent API using the agent.speak.provider.speed parameter. This parameter accepts a float value between 0.7 and 1.5, with 1.0 as the default.
This feature is in Early Access and is only available for Deepgram TTS. For more details, see TTS voice controls. To request access, contact your Account Executive or reach out to sales@deepgram.com.
Updated LLM models
New OpenAI models — Two new models are now available in the Standard pricing tier:
gpt-5.4-nanogpt-5.4-mini
Gemini 2.0 Flash deprecated — The gemini-2.0-flash model is now deprecated. We recommend migrating to gemini-2.5-flash or a newer Gemini model. See the Google models table for alternatives.
For the full list of supported models and pricing tiers, see the Voice Agent LLM Models documentation.
Deepgram Self-Hosted March 2026 Release (260319)
Container Images (release 260319)
-
quay.io/deepgram/self-hosted-api:release-260319- Equivalent image to:
quay.io/deepgram/self-hosted-api:1.180.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-engine:release-260319-
Equivalent image to:
quay.io/deepgram/self-hosted-engine:3.114.4
-
Minimum required NVIDIA driver version:
>=570.172.08
-
-
quay.io/deepgram/self-hosted-license-proxy:release-260319- Equivalent image to:
quay.io/deepgram/self-hosted-license-proxy:1.10.1
- Equivalent image to:
-
quay.io/deepgram/self-hosted-billing:release-260319- Equivalent image to:
quay.io/deepgram/self-hosted-billing:1.13.0
- Equivalent image to:
This Release Contains The Following Changes
- Flux Regression Fix — Resolves Flux support regression from the 260305 release. See Deploy Flux Model (STT) for deployment details.
- Nova-3 Language Expansion — New models: Thai (
th,th-TH), Chinese Cantonese Traditional (zh-HK). Improved models: Bengali (bn), Marathi (mr), Tamil (ta), Telugu (te). See the full announcement for details. - Flux Status Metrics — Self-hosted status endpoint now includes Flux stream metrics. See Status Endpoint for details.
- Certificate Status Endpoint — New
/v1/certificatesendpoint on all container images returns beginning-of-support, end-of-support, and end-of-life dates. See Certificate Status for details. - Log Formats — New configurable log output formats: Full, Compact, Pretty, Json. See Log Formats for configuration details.
- General Improvements — Keeps our software up-to-date.
Nova-3 Model Update
🌏 Nova-3 now supports the following new languages and language codes:
- Chinese (Cantonese, Traditional):
zh-HK - Thai:
th,th-TH
🚀 Also releasing improved Nova-3 models for the following languages:
- Bengali (
bn) - Marathi (
mr) - Tamil (
ta) - Telugu (
te)
Access these models by setting model="nova-3" and the relevant language code in your request.
Learn more about Nova-3 and supported languages on the Models and Language Overview page.
🤖 New LLM Models Support & Bug Fixes
We’ve added support for new LLM models in the Voice Agent API:
- OpenAI GPT-5.3 Instant (
gpt-5.3-chat-latest) - OpenAI GPT 5.4 (
gpt-5.4) - Google Gemini 3.1 Flash Lite (
gemini-3.1-flash-lite-preview)
Example:
For the full list of supported models and pricing tiers, visit our Voice Agent LLM Models documentation.
Fixes
- Resolves an issue where the GPT-5.2 Instant model used an incorrect model ID and pricing tier. The model now uses the correct ID (
gpt-5.2-chat-latest) and is assigned to theAdvancedtier.
Nova-3 Model Update
🎯 Nova-3 Swedish and Dutch Model Enhancements
We’ve released updated Nova-3 Swedish and Nova-3 Dutch models, offering improved accuracy for both streaming and batch transcription.
Access these models by setting model: "nova-3" and the relevant language code:
- Swedish (
sv,sv-SE) - Dutch (
nl)
Learn more about Nova-3 on the Models and Language Overview page.
Reasoning mode for OpenAI thinking models
You can now control the reasoning effort of supported OpenAI reasoning models using the new reasoning_mode parameter in the think provider configuration. This parameter maps to OpenAI’s reasoning_effort and accepts low, medium, or high.
Example:
For more details, visit the Configure the Voice Agent documentation.