Voice Agent API Migration Guide

Guide for migrating from Voice Agent API Early Access to V1.

Voice Agent

This guide helps developers migrate from the early access version of the Deepgram Voice Agent API to the official V1 release.

The Deepgram API Spec and Voice Agent API Reference have more details on the new Voice Agent API.

Endpoint Changes

Early AccessV1
wss://agent.deepgram.com/agentwss://agent.deepgram.com/v1/agent/converse

Message Type Changes

Removed Message Types

The following message types from early access have been removed in V1:

Message TypeDescription
UpdateInstructionsNow handled through the more flexible Settings structure
FunctionCallingFunction calling status is now handled differently

New Message Types

Here is a list of all-new message types in V1:

Message TypeDescription
PromptUpdatedConfirmation that a prompt update has been applied
SpeakUpdatedConfirmation that a speak configuration update has been applied
WarningNon-fatal errors or warnings
AgentThinkingNotification that the agent is thinking
UserStartedSpeakingNotification that the user has started speaking

Welcome Message Changes

The welcome message has had the session_id field renamed to request_id to better align with other products.

Early Access: Welcome

1{
2 "type": "Welcome",
3 "session_id": "fc553ec9-5874-49ca-a47c-b670d525a4b1"
4}

V1: Welcome

1{
2 "type": "Welcome",
3 "request_id": "fc553ec9-5874-49ca-a47c-b670d525a4b1"
4}

SettingsConfiguration Becomes Settings

The most significant change is to the configuration message:

Early Access: SettingsConfiguration

1{
2 "type": "SettingsConfiguration",
3 "audio": {
4 "input": { "encoding": "linear16", "sample_rate": 16000 },
5 "output": { "encoding": "linear16", "sample_rate": 24000 }
6 },
7 "agent": {
8 "instructions": "You are a helpful AI assistant. Keep responses concise.",
9 "listen_model": "nova",
10 "think_model": "gpt-4",
11 "speak_model": "aura"
12 }
13}

V1: Settings

1{
2 "type": "Settings",
3 "audio": {
4 "input": { "encoding": "linear16", "sample_rate": 16000 },
5 "output": { "encoding": "linear16", "sample_rate": 24000 }
6 },
7 "agent": {
8 "listen": { "provider": { "model": "nova-3" } },
9 "think": {
10 "provider": { "model": "gpt-4o-mini" },
11 "prompt": "You are a helpful AI assistant. Keep responses concise."
12 },
13 "speak": { "provider": { "model": "aura-2-andromeda-en" } }
14 }
15}

For more details on all the possible settings available in the new Settings message, check out the Configure the Voice Agent guide.

Key differences:

  1. Message type changed from SettingsConfiguration to Settings
  2. Added fields: mip_opt_out and experimental
  3. Introduced provider-based structure for listen, think, and speak capabilities
  4. instructions field renamed to prompt in the think configuration
  5. Added container field to audio output configuration
  6. Added optional greeting field
  7. Added support for custom endpoints via the endpoint object for non-Deepgram providers

UpdateSpeak Changes

The UpdateSpeak message has been restructured to use the provider pattern:

Early Access: UpdateSpeak

1{
2 "type": "UpdateSpeak",
3 "model": "aura-asteria-en"
4}

V1: UpdateSpeak

1{
2 "type": "UpdateSpeak",
3 "speak": {
4 "provider": {
5 "type": "deepgram",
6 "model": "aura-2-thalia-en"
7 }
8 }
9}

InjectAgentMessage Changes

The InjectAgentMessage message has a field rename:

Early Access: InjectAgentMessage

1{
2 "type": "InjectAgentMessage",
3 "message": "I apologize, but I need to correct my previous statement..."
4}

V1: InjectAgentMessage

1{
2 "type": "InjectAgentMessage",
3 "content": "I apologize, but I need to correct my previous statement..."
4}

Function Calling Changes

The function calling interface has significant changes:

Early Access: FunctionCallRequest

1{
2 "type": "FunctionCallRequest",
3 "function_name": "get_weather",
4 "function_call_id": "fc_12345678-90ab-cdef-1234-567890abcdef",
5 "input": {
6 "location": "Fremont, CA 94539"
7 }
8}

V1: FunctionCallRequest

1{
2 "type": "FunctionCallRequest",
3 "functions": [
4 {
5 "id": "fc_12345678-90ab-cdef-1234-567890abcdef",
6 "name": "get_weather",
7 "arguments": "{\"location\": \"Fremont, CA 94539\"}",
8 "client_side": true
9 }
10 ]
11}

Early Access: FunctionCallResponse

1{
2 "type": "FunctionCallResponse",
3 "function_call_id": "fc_12345678-90ab-cdef-1234-567890abcdef",
4 "output": "{\"location\": \"Fremont, CA 94539\", \"temperature_c\": 21, \"condition\": \"Sunny\", \"humidity\": 40, \"wind_kph\": 14}"
5}

V1: FunctionCallResponse

1{
2 "type": "FunctionCallResponse",
3 "id": "fc_12345678-90ab-cdef-1234-567890abcdef",
4 "name": "get_weather",
5 "content": "{\"location\": \"Fremont, CA 94539\", \"temperature_c\": 21, \"condition\": \"Sunny\", \"humidity\": 40, \"wind_kph\": 14}"
6}

Error Response Changes

The Error message structure has been updated:

Early Access: Error

1{
2 "type": "Error",
3 "message": "Failed to process audio input: Invalid audio format"
4}

V1: Error

1{
2 "type": "Error",
3 "description": "Failed to process audio input: Invalid audio format",
4 "code": "INVALID_AUDIO_FORMAT"
5}

Function Call Handling in V1

The function calling system in V1 has been significantly improved with a clearer client-side vs. internal server-side execution model.

FunctionCallRequest

In V1, the FunctionCallRequest message includes a client_side flag that explicitly indicates where the function should be executed:

1{
2 "type": "FunctionCallRequest",
3 "functions": [
4 {
5 "id": "fc_12345678-90ab-cdef-1234-567890abcdef",
6 "name": "get_weather",
7 "arguments": "{\"location\": \"Fremont, CA 94539\"}",
8 "client_side": true
9 }
10 ]
11}

When handling a FunctionCallRequest:

  1. Check the client_side flag in each function
  2. If client_side is true, your client code must:
    • Execute the specified function with the provided arguments
    • Send a FunctionCallResponse back to the server
  3. If client_side is false, no client action is needed as the server will handle it internally

FunctionCallResponse

The FunctionCallResponse message has been updated to include the function name and uses clearer field names:

1{
2 "type": "FunctionCallResponse",
3 "id": "fc_12345678-90ab-cdef-1234-567890abcdef",
4 "name": "get_weather",
5 "content": "{\"location\": \"Fremont, CA 94539\", \"temperature_c\": 21, \"condition\": \"Sunny\", \"humidity\": 40, \"wind_kph\": 14}"
6}

Key points about FunctionCallResponse:

  1. It can be sent by either the client or the server depending on where the function was executed
  2. The id field links the response to the original request
  3. The name field identifies which function was called
  4. The content field contains the function result, often in JSON format

Implementation Tips

When migrating your function calling implementation:

  1. Update your client code to check the client_side flag
  2. Only respond to functions where client_side is true
  3. Use the id field to track which request you’re responding to
  4. Include both the function name and content in your response
  5. Expect the server to send you FunctionCallResponse messages for functions with client_side: false

Migration Checklist

  1. ✅ Update WebSocket endpoint URL
  2. ✅ Update configuration message format
    • Rename SettingsConfiguration to Settings
    • Add provider objects for listen, think, and speak
    • Change instructions to prompt
    • Use specific model identifiers
  3. ✅ Update function calling implementation
    • Adapt to new request/response formats
    • Implement client_side flag handling
  4. ✅ Handle error messages with new format
    • Use description instead of message
    • Process error codes
  5. ✅ Implement support for new message types
    • Handle PromptUpdated and SpeakUpdated confirmations
    • Process Warning messages
  6. ✅ Update the InjectAgentMessage format
    • Change message field to content
  7. ✅ Handle welcome messages with request_id instead of session_id

Common Migration Issues

  1. Configuration not accepted: Make sure you’ve updated to the provider-based structure for capabilities
  2. Function calls not working: Update both the request and response formats to match V1 specifications
  3. Error handling failures: Update error handling to use description instead of message
  4. Cannot inject messages: Use content instead of message in InjectAgentMessage
  5. Missing confirmation messages: Implement handlers for the new confirmation message types

New Capabilities in V1

  1. Multi-provider support: Configure different providers for listen, think, and speak capabilities
  2. Greeting messages: Set an initial greeting via the greeting field
  3. Improved error handling: Structured errors with codes for better diagnostics
  4. Client-side function execution: Control whether functions run client-side or server-side
  5. Warnings: Non-fatal issues are now reported via Warning messages
Built with