Voice Agent v1 API Migration

Guide for migrating from Voice Agent API v0 to v1.

Voice Agent

The Voice Agent v1 Migration guide provides a comprehensive overview of the changes and updates required to migrate from Voice Agent v0 to v1.

The Deepgram API Spec and Voice Agent API Reference have more details on the new Voice Agent API.

WebSocket Connection

The WebSocket path has changed to:

  • v0: wss://agent.deepgram.com/agent
  • v1: wss://agent.deepgram.com/v1/agent/converse

Settings Configuration Changes

  • Object name changed from settings_configuration to settings
  • Under the agent object, the listen ,think and speak objects now have an additional provider object
  • Settings now contains a new experimental field to opt-in to experimental Agent features
  • Settings now contains a new mip_opt_out field to opt-out of the Deepgram Model Improvement Program

New Settings Example

For a detailed explanation of all the options available for Settings, refer to our documentation on how to Configure the Voice Agent.

JSON
1{
2"type": "Settings",
3"mip_opt_out": false,
4"experimental": false,
5"audio": {
6 "input": {
7 "encoding": "linear16",
8 "sample_rate": 24000
9 },
10 "output": {
11 "encoding": "linear16",
12 "sample_rate": 24000,
13 "container": "none"
14 // ... additional output options: bitrate
15 }
16},
17"agent": {
18 "language": "en",
19 "listen": {
20 "provider": {
21 "type": "deepgram",
22 "model": "nova-3"
23 // ... additional provider options: keyterms (nova-3 'en' only)
24 }
25 },
26 "think": {
27 "provider": {
28 "type": "open_ai",
29 "model": "gpt-4o-mini",
30 "temperature": 0.7
31 // ... additional provider options: endpoint (for non-deepgram providers)
32 },
33 // ... additional think options: prompt, functions, endpoint
34 },
35 "speak": {
36 "provider": {
37 "type": "deepgram",
38 "model": "aura-2-thalia-en"
39 // ... additional provider options based on type:
40 // - eleven_labs: model_id, language_code
41 // - cartesia: model_id, voice (mode, id), language
42 // - open_ai: voice
43 }
44 // ... additional speak options: endpoint (for non-deepgram providers)
45 }
46 // ... additional agent options: greeting
47}
48// ... additional top-level options: experimental
49}

Audio Configuration Changes

Input Audio:

  • Object structure is now audio.input.option
  • audio.input is now required instead of nullable
  • Default values: encoding="linear16", sample_rate=16000

Output Audio:

  • Object structure is now audio.output.option

Provider Configuration Changes

Speak Provider:

  • Object structure is now agent.speak.provider
  • For Deepgram TTS, only type and model are required
  • Other fields like model_id, voice, language, language_code are optional and only used for other providers

Think Provider:

  • Object structure is now agent.think.provider
  • For Deepgram LLM, only type and model are required
  • Other fields like endpoint, temperature, functions, prompt are optional and only used for other providers

LLM Instructions Changes

Prompt / LLM Instructions Location:

  • v0: Instructions were set in agent.think.instructions
  • v1: All instructions are now set in agent.think.prompt

Prompt / LLM Instructions Message Types:

  • v0: UpdateInstructions used instructions field.
  • v1: UpdateInstructions now uses prompt field.

New Agent Server Messages

  • PromptUpdated: Confirms prompt updates
  • SpeakUpdated: Confirms speak model changes
  • Warning: For non-fatal errors

API Deprecations

Experimental Features

Server Messages Changes

Welcome

  • Confirms WebSocket connection success
  • v0 Welcome message included session_id
  • v1 Welcome message now includes request_id instead of session_id

Function Calling Changes

For more detailed information about function calling, see our Function Calling Documentation.

Message Structure Changes:

Function Call Configuration:

  • Object structure is agent.think.functions in the Settings message
yml
1functions:
2 type: array
3 items:
4 type: object
5 properties:
6 name:
7 type: string
8 description: Function name
9 description:
10 type: string
11 description: Function description
12 parameters:
13 type: object
14 description: Function parameters
15 endpoint:
16 type: object
17 description: The Function endpoint to call. if not passed, function is called client-side
18 properties:
19 url:
20 type: string
21 description: Endpoint URL
22 method:
23 type: string
24 description: HTTP method
25 headers:
26 type: object
27 additionalProperties:
28 type: string

Function Call Execution:

  • Client-side functions:
    • Executed locally in the client application
    • No endpoint configuration required
    • Set client_side: true in function definition
  • Server-side functions:
    • Executed on the server
    • Requires endpoint configuration
    • Set client_side: false in function definition

Migration Checklist

Consider the following checklist to ensure a smooth migration:

  • WebSocket connection establishes successfully
  • Settings message is accepted without errors
  • Audio data is processed correctly
  • Voice agent responds appropriately
  • Function calls work as expected
  • Error handling works correctly
  • Warning messages are handled properly
  • Client-side and server-side functions work as expected
  • Audio input/output configuration is correct
  • Provider configurations are properly set up
  • All server messages are properly handled
  • Client messages are correctly formatted
  • Error and warning handling is implemented
  • Audio streaming works in both directions
Built with