LLM Models
Defines the LLM (Large Language Model) to be used with your Agent. The provider.type field specifies the format or protocol of the API.
For example:
open_aimeans the API follows OpenAI’s Chat Completions format.- This option can be used with OpenAI, Azure OpenAI, or Amazon Bedrock — as long as the endpoint behaves like OpenAI’s Chat Completion API.
You can set your Voice Agent’s LLM model in the Settings Message See the docs for more information.
Supported LLM providers
You can query the following endpoint to check the supported models for each provider:
Example Payload
If you don’t specify agent.think.provider.type the Voice Agent will use Deepgram’s default managed LLMs. For managed LLMs, supported model names are predefined in our configuration.
The agent.think.endpoint is optional or required based on the provider type:
- For
open_ai,anthropic,google, andnvidia, theendpointfield is optional because Deepgram provides managed LLMs for these providers. - For
groqandaws_bedrockprovider types,endpointis required because Deepgram does not manage those LLMs. - If an
endpointis provided theurlis required butheadersare optional.
When using aws_bedrock as the provider type, you must also provide AWS credentials in the agent.think.provider.credentials field. This should include:
type: Either “iam” or “sts”region: AWS region (e.g., “us-east-2”)access_key_id: Your AWS access key IDsecret_access_key: Your AWS secret access keysession_token: Required only whentypeis “sts”
Supported LLM models
OpenAI
Anthropic
Example using Deepgram’s managed Google LLM
Example using a custom Google endpoint (BYO)
When using a custom endpoint, the model property is not supported.
The desired model is specified as part of the endpoint URL instead.
Use API keys from Google AI Studio for Gemini models. Keys from Vertex AI, Workspace Gemini, or Gemini Enterprise will not work with the Agent API.
NVIDIA
Example using Deepgram’s managed NVIDIA LLM
Groq
Example Payload
Passing a custom (BYO) LLM through a Cloud Provider
For Bring Your Own (BYO) LLMs, any model string provided is accepted without restriction.
Deepgram tests against major LLM providers including OpenAI, Anthropic, and Google. When bringing your own LLM, you have two options:
- Use an OpenAI-compatible LLM service or gateway. Set
provider.typetoopen_aiand point theendpoint.urlto your service. Any LLM endpoint that conforms to the OpenAI Chat Completions API format will work, including third-party LLM gateways. - Use a custom endpoint from one of the supported major LLM providers. If you have your own contract or deployment with a supported provider (such as OpenAI, Anthropic, or Google), set the
provider.typeto match that provider and supply your ownendpoint.urlandendpoint.headers.
In both cases, configure the provider.type to one of the supported provider values and set the endpoint.url and endpoint.headers fields to the correct values for your provider or gateway.
Using multiple LLM providers
The think object accepts both a single provider and an array of providers. When you supply an array, the Voice Agent uses the providers as an ordered fallback chain: it sends each LLM request to the first provider in the list and automatically falls back to the next provider if the request fails.
How fallback works
- The agent sends the request to the first provider in the array.
- If that provider returns an error or times out, the agent sends a
THINK_REQUEST_FAILEDwarning over the WebSocket and retries with the next provider. - This continues through every provider in the array.
- If all providers fail, the agent sends a
FAILED_TO_THINKerror and the turn produces no LLM response.
The fallback is per-request — each new conversational turn starts again from the first provider. Provider order matters, so place your preferred provider first and your most reliable fallback last.
Fallback providers do not need to use the same provider.type. You can mix providers (for example, open_ai primary with an anthropic fallback) to maximize availability across independent infrastructure.