Getting Started

An introduction to using Deepgram’s Voice Agent API to build interactive voice agents.

Voice Agent

You will need to migrate to the new V1 Agent API to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.

In this guide, you’ll learn how to create a very basic voice agent using Deepgram’s Agent API. Visit the API Reference for more details on how to use the Agent API.

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

API Playground

Explore the Voice Agent API in our API Playground where you can quickly interact with the Agent API.

Deepgram API Playground Try this feature out in our API Playground!

Build a Basic Voice Agent

1. Set up your environment

In the steps below, you’ll use the Terminal to:

  1. Create a new directory for your project
  2. Create a new file for your code
  3. Export your Deepgram API key to the environment so you can use it in your code
  4. Run any additional commands to initialize your project
$mkdir deepgram-agent-demo
>cd deepgram-agent-demo
>touch index.py
>export DEEPGRAM_API_KEY="your_Deepgram_API_key_here"
>@TODO other Python commands

2. Install the Deepgram SDK

Deepgram has several SDKs that can make it easier to build a Voice Agent. Follow these steps below to use one of our SDKs to make your first Deepgram Voice Agent request.

In your terminal, navigate to the location on your drive where you created your project above, and install the Deepgram SDK and any other dependencies.

$pip install deepgram-sdk

3. Import dependencies and set up the main function

Next, import the necessary dependencies and set up your main application function.

1# Import dependencies and set up the main function
2import requests
3import wave
4import io
5import time
6import os
7import json
8import threading
9from datetime import datetime
10
11from deepgram import (
12 DeepgramClient,
13 DeepgramClientOptions,
14 AgentWebSocketEvents,
15 AgentKeepAlive,
16)
17from deepgram.clients.agent.v1.websocket.options import SettingsOptions

4. Initialize the Voice Agent

Now you can initialize the voice agent by creating an empty audio buffer to store incoming audio data, setting up a counter for output file naming, and defining a sample audio file URL. You can then establish a connection to Deepgram and set up a welcome handler to log when the connection is successfully established.

This example code doesn’t use microphone audio input as that is much more complex and requires additional knowledge of how to handle audio input through a microphone. Please refer to our Voice Agent Starter Apps for examples using microphone audio input.

1def main():
2 try:
3 # Initialize the Voice Agent
4 api_key = os.getenv("DEEPGRAM_API_KEY")
5 if not api_key:
6 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
7 print(f"API Key found:")
8
9 # Initialize Deepgram client
10 config = DeepgramClientOptions(
11 options={
12 "keepalive": "true",
13 },
14 )
15 deepgram = DeepgramClient(api_key, config)
16 connection = deepgram.agent.websocket.v("1")
17 print("Created WebSocket connection...")
18
19 # The code in the following steps will go here

5. Configure the Agent

Next you will need to set up a very simplified version of the Settings message to configure your Agent’s behavior and set the required settings options for your Agent.

To learn more about all settings options available for an Agent, refer to the Configure the Voice Agent documentation.

1 # Configure the Agent
2 options = SettingsOptions()
3 # Audio input configuration
4 options.audio.input.encoding = "linear16"
5 options.audio.input.sample_rate = 24000
6 # Audio output configuration
7 options.audio.output.encoding = "linear16"
8 options.audio.output.sample_rate = 24000
9 options.audio.output.container = "wav"
10 # Agent configuration
11 options.agent.language = "en"
12 options.agent.listen.provider.type = "deepgram"
13 options.agent.listen.model = "nova-3"
14 options.agent.think.provider.type = "open_ai"
15 options.agent.think.model = "gpt-4o-mini"
16 options.agent.think.prompt = "You are a friendly AI assistant."
17 options.agent.speak.provider.type = "deepgram"
18 options.agent.speak.model = "aura-2-thalia-en"
19 options.agent.greeting = "Hello! How can I help you today?"

6. Send Keep Alive messages

Next you will send a keep-alive signal every 5 seconds to maintain the WebSocket connection. This prevents the connection from timing out during long audio processing. You will also fetch an audio file from the specified URL spacewalk.wav and stream the audio data in chunks to the Agent. Each chunk is sent as it becomes available in the readable stream.

1 # Send Keep Alive messages
2 def send_keep_alive():
3 while True:
4 time.sleep(5)
5 print("Keep alive!")
6 connection.send(str(AgentKeepAlive()))
7
8 # Start keep-alive in a separate thread
9 keep_alive_thread = threading.Thread(target=send_keep_alive, daemon=True)
10 keep_alive_thread.start()

7. Setup Event Handlers and Other Functions

Next you will use this code to set up event handlers for the voice agent to manage the entire conversation lifecycle, from connection opening to closing. It handles audio processing by collecting chunks into a buffer and saving completed responses as WAV files, while also managing interruptions, logging conversations, and handling errors.

1 # Setup Event Handlers
2 audio_buffer = bytearray()
3 file_counter = 0
4 processing_complete = False
5
6 def on_audio_data(self, data, **kwargs):
7 nonlocal audio_buffer
8 audio_buffer.extend(data)
9 print(f"Received audio data from agent: {len(data)} bytes")
10 print(f"Total buffer size: {len(audio_buffer)} bytes")
11 print(f"Audio data format: {data[:16].hex()}...")
12
13 def on_agent_audio_done(self, agent_audio_done, **kwargs):
14 nonlocal audio_buffer, file_counter, processing_complete
15 print(f"AgentAudioDone event received")
16 print(f"Buffer size at completion: {len(audio_buffer)} bytes")
17 print(f"Agent audio done: {agent_audio_done}")
18 if len(audio_buffer) > 0:
19 with open(f"output-{file_counter}.wav", 'wb') as f:
20 f.write(create_wav_header())
21 f.write(audio_buffer)
22 print(f"Created output-{file_counter}.wav")
23 audio_buffer = bytearray()
24 file_counter += 1
25 processing_complete = True
26
27 def on_conversation_text(self, conversation_text, **kwargs):
28 print(f"Conversation Text: {conversation_text}")
29 with open("chatlog.txt", 'a') as chatlog:
30 chatlog.write(f"{json.dumps(conversation_text.__dict__)}\n")
31
32 def on_welcome(self, welcome, **kwargs):
33 print(f"Welcome message received: {welcome}")
34 with open("chatlog.txt", 'a') as chatlog:
35 chatlog.write(f"Welcome message: {welcome}\n")
36
37 def on_settings_applied(self, settings_applied, **kwargs):
38 print(f"Settings applied: {settings_applied}")
39 with open("chatlog.txt", 'a') as chatlog:
40 chatlog.write(f"Settings applied: {settings_applied}\n")
41
42 def on_user_started_speaking(self, user_started_speaking, **kwargs):
43 print(f"User Started Speaking: {user_started_speaking}")
44 with open("chatlog.txt", 'a') as chatlog:
45 chatlog.write(f"User Started Speaking: {user_started_speaking}\n")
46
47 def on_agent_thinking(self, agent_thinking, **kwargs):
48 print(f"Agent Thinking: {agent_thinking}")
49 with open("chatlog.txt", 'a') as chatlog:
50 chatlog.write(f"Agent Thinking: {agent_thinking}\n")
51
52 def on_agent_started_speaking(self, agent_started_speaking, **kwargs):
53 nonlocal audio_buffer
54 audio_buffer = bytearray() # Reset buffer for new response
55 print(f"Agent Started Speaking: {agent_started_speaking}")
56 with open("chatlog.txt", 'a') as chatlog:
57 chatlog.write(f"Agent Started Speaking: {agent_started_speaking}\n")
58
59 def on_close(self, close, **kwargs):
60 print(f"Connection closed: {close}")
61 with open("chatlog.txt", 'a') as chatlog:
62 chatlog.write(f"Connection closed: {close}\n")
63
64 def on_error(self, error, **kwargs):
65 print(f"Error: {error}")
66 with open("chatlog.txt", 'a') as chatlog:
67 chatlog.write(f"Error: {error}\n")
68
69 def on_unhandled(self, unhandled, **kwargs):
70 print(f"Unhandled event: {unhandled}")
71 with open("chatlog.txt", 'a') as chatlog:
72 chatlog.write(f"Unhandled event: {unhandled}\n")
73
74 # Register handlers
75 connection.on(AgentWebSocketEvents.AudioData, on_audio_data)
76 connection.on(AgentWebSocketEvents.AgentAudioDone, on_agent_audio_done)
77 connection.on(AgentWebSocketEvents.ConversationText, on_conversation_text)
78 connection.on(AgentWebSocketEvents.Welcome, on_welcome)
79 connection.on(AgentWebSocketEvents.SettingsApplied, on_settings_applied)
80 connection.on(AgentWebSocketEvents.UserStartedSpeaking, on_user_started_speaking)
81 connection.on(AgentWebSocketEvents.AgentThinking, on_agent_thinking)
82 connection.on(AgentWebSocketEvents.AgentStartedSpeaking, on_agent_started_speaking)
83 connection.on(AgentWebSocketEvents.Close, on_close)
84 connection.on(AgentWebSocketEvents.Error, on_error)
85 connection.on(AgentWebSocketEvents.Unhandled, on_unhandled)
86 print("Event handlers registered")
87
88 # Start the connection
89 print("Starting WebSocket connection...")
90 if not connection.start(options):
91 print("Failed to start connection")
92 return
93 print("WebSocket connection started successfully")
94
95 # Stream audio
96 print("Downloading and sending audio...")
97 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
98 # Skip WAV header
99 header = response.raw.read(44)
100
101 # Verify WAV header
102 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
103 print("Invalid WAV header")
104 return
105
106 # Extract sample rate from header
107 sample_rate = int.from_bytes(header[24:28], 'little')
108
109 chunk_size = 8192
110 total_bytes_sent = 0
111 chunk_count = 0
112 for chunk in response.iter_content(chunk_size=chunk_size):
113 if chunk:
114 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
115 connection.send(chunk)
116 total_bytes_sent += len(chunk)
117 chunk_count += 1
118 time.sleep(0.1) # Small delay between chunks
119
120 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
121 print("Waiting for agent response...")
122
123 # Wait for processing
124 print("Waiting for processing to complete...")
125 start_time = time.time()
126 timeout = 30 # 30 second timeout
127
128 while not processing_complete and (time.time() - start_time) < timeout:
129 time.sleep(1)
130 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
131
132 if not processing_complete:
133 print("Processing timed out after 30 seconds")
134 else:
135 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
136
137 # Cleanup
138 connection.finish()
139 print("Finished")
140
141 except Exception as e:
142 print(f"Error: {str(e)}")
143
144# WAV Header Functions
145def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
146 """Create a WAV header with the specified parameters"""
147 byte_rate = sample_rate * channels * (bits_per_sample // 8)
148 block_align = channels * (bits_per_sample // 8)
149
150 header = bytearray(44)
151 # RIFF header
152 header[0:4] = b'RIFF'
153 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
154 header[8:12] = b'WAVE'
155 # fmt chunk
156 header[12:16] = b'fmt '
157 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
158 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
159 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
160 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
161 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
162 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
163 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
164 # data chunk
165 header[36:40] = b'data'
166 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
167
168 return header
169
170if __name__ == "__main__":
171 main()

7. Run the Voice Agent

Now that you have your complete code, you can run the Voice Agent! If it works you should see the conversation text and audio in the files: output-0.wav and chatlog.txt. These files will be saved in the same directory as your main application file.

1python main.py

8. Putting it all together

Below is the final code for the Voice Agent you just built. If you saw any errors after running your Agent, you can compare the code below to the code you wrote in the steps above to find and fix the errors.

1# Copyright 2025 Deepgram SDK contributors. All Rights Reserved.
2# Use of this source code is governed by a MIT license that can be found in the LICENSE file.
3# SPDX-License-Identifier: MIT
4
5# Import dependencies and set up the main function
6import requests
7import wave
8import io
9import time
10import os
11import json
12import threading
13from datetime import datetime
14
15from deepgram import (
16 DeepgramClient,
17 DeepgramClientOptions,
18 AgentWebSocketEvents,
19 AgentKeepAlive,
20)
21from deepgram.clients.agent.v1.websocket.options import SettingsOptions
22
23def main():
24 try:
25 # Initialize the Voice Agent
26 api_key = os.getenv("DEEPGRAM_API_KEY")
27 if not api_key:
28 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
29 print(f"API Key found:")
30
31 # Initialize Deepgram client
32 config = DeepgramClientOptions(
33 options={
34 "keepalive": "true",
35 # "speaker_playback": "true",
36 },
37 )
38 deepgram = DeepgramClient(api_key, config)
39 connection = deepgram.agent.websocket.v("1")
40 print("Created WebSocket connection...")
41
42 # 4. Configure the Agent
43 options = SettingsOptions()
44 # Audio input configuration
45 options.audio.input.encoding = "linear16"
46 options.audio.input.sample_rate = 24000
47 # Audio output configuration
48 options.audio.output.encoding = "linear16"
49 options.audio.output.sample_rate = 24000
50 options.audio.output.container = "wav"
51 # Agent configuration
52 options.agent.language = "en"
53 options.agent.listen.provider.type = "deepgram"
54 options.agent.listen.model = "nova-3"
55 options.agent.think.provider.type = "open_ai"
56 options.agent.think.model = "gpt-4o-mini"
57 options.agent.think.prompt = "You are a friendly AI assistant."
58 options.agent.speak.provider.type = "deepgram"
59 options.agent.speak.model = "aura-2-thalia-en"
60 options.agent.greeting = "Hello! How can I help you today?"
61
62 # Send Keep Alive messages
63 def send_keep_alive():
64 while True:
65 time.sleep(5)
66 print("Keep alive!")
67 connection.send(str(AgentKeepAlive()))
68
69 # Start keep-alive in a separate thread
70 keep_alive_thread = threading.Thread(target=send_keep_alive, daemon=True)
71 keep_alive_thread.start()
72
73 # Setup Event Handlers
74 audio_buffer = bytearray()
75 file_counter = 0
76 processing_complete = False
77
78 def on_audio_data(self, data, **kwargs):
79 nonlocal audio_buffer
80 audio_buffer.extend(data)
81 print(f"Received audio data from agent: {len(data)} bytes")
82 print(f"Total buffer size: {len(audio_buffer)} bytes")
83 print(f"Audio data format: {data[:16].hex()}...")
84
85 def on_agent_audio_done(self, agent_audio_done, **kwargs):
86 nonlocal audio_buffer, file_counter, processing_complete
87 print(f"AgentAudioDone event received")
88 print(f"Buffer size at completion: {len(audio_buffer)} bytes")
89 print(f"Agent audio done: {agent_audio_done}")
90 if len(audio_buffer) > 0:
91 with open(f"output-{file_counter}.wav", 'wb') as f:
92 f.write(create_wav_header())
93 f.write(audio_buffer)
94 print(f"Created output-{file_counter}.wav")
95 audio_buffer = bytearray()
96 file_counter += 1
97 processing_complete = True
98
99 def on_conversation_text(self, conversation_text, **kwargs):
100 print(f"Conversation Text: {conversation_text}")
101 with open("chatlog.txt", 'a') as chatlog:
102 chatlog.write(f"{json.dumps(conversation_text.__dict__)}\n")
103
104 def on_welcome(self, welcome, **kwargs):
105 print(f"Welcome message received: {welcome}")
106 with open("chatlog.txt", 'a') as chatlog:
107 chatlog.write(f"Welcome message: {welcome}\n")
108
109 def on_settings_applied(self, settings_applied, **kwargs):
110 print(f"Settings applied: {settings_applied}")
111 with open("chatlog.txt", 'a') as chatlog:
112 chatlog.write(f"Settings applied: {settings_applied}\n")
113
114 def on_user_started_speaking(self, user_started_speaking, **kwargs):
115 print(f"User Started Speaking: {user_started_speaking}")
116 with open("chatlog.txt", 'a') as chatlog:
117 chatlog.write(f"User Started Speaking: {user_started_speaking}\n")
118
119 def on_agent_thinking(self, agent_thinking, **kwargs):
120 print(f"Agent Thinking: {agent_thinking}")
121 with open("chatlog.txt", 'a') as chatlog:
122 chatlog.write(f"Agent Thinking: {agent_thinking}\n")
123
124 def on_agent_started_speaking(self, agent_started_speaking, **kwargs):
125 nonlocal audio_buffer
126 audio_buffer = bytearray() # Reset buffer for new response
127 print(f"Agent Started Speaking: {agent_started_speaking}")
128 with open("chatlog.txt", 'a') as chatlog:
129 chatlog.write(f"Agent Started Speaking: {agent_started_speaking}\n")
130
131 def on_close(self, close, **kwargs):
132 print(f"Connection closed: {close}")
133 with open("chatlog.txt", 'a') as chatlog:
134 chatlog.write(f"Connection closed: {close}\n")
135
136 def on_error(self, error, **kwargs):
137 print(f"Error: {error}")
138 with open("chatlog.txt", 'a') as chatlog:
139 chatlog.write(f"Error: {error}\n")
140
141 def on_unhandled(self, unhandled, **kwargs):
142 print(f"Unhandled event: {unhandled}")
143 with open("chatlog.txt", 'a') as chatlog:
144 chatlog.write(f"Unhandled event: {unhandled}\n")
145
146 # Register handlers
147 connection.on(AgentWebSocketEvents.AudioData, on_audio_data)
148 connection.on(AgentWebSocketEvents.AgentAudioDone, on_agent_audio_done)
149 connection.on(AgentWebSocketEvents.ConversationText, on_conversation_text)
150 connection.on(AgentWebSocketEvents.Welcome, on_welcome)
151 connection.on(AgentWebSocketEvents.SettingsApplied, on_settings_applied)
152 connection.on(AgentWebSocketEvents.UserStartedSpeaking, on_user_started_speaking)
153 connection.on(AgentWebSocketEvents.AgentThinking, on_agent_thinking)
154 connection.on(AgentWebSocketEvents.AgentStartedSpeaking, on_agent_started_speaking)
155 connection.on(AgentWebSocketEvents.Close, on_close)
156 connection.on(AgentWebSocketEvents.Error, on_error)
157 connection.on(AgentWebSocketEvents.Unhandled, on_unhandled)
158 print("Event handlers registered")
159
160 # Start the connection
161 print("Starting WebSocket connection...")
162 if not connection.start(options):
163 print("Failed to start connection")
164 return
165 print("WebSocket connection started successfully")
166
167 # Stream audio
168 print("Downloading and sending audio...")
169 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
170 # Skip WAV header
171 header = response.raw.read(44)
172
173 # Verify WAV header
174 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
175 print("Invalid WAV header")
176 return
177
178 # Extract sample rate from header
179 sample_rate = int.from_bytes(header[24:28], 'little')
180
181 chunk_size = 8192
182 total_bytes_sent = 0
183 chunk_count = 0
184 for chunk in response.iter_content(chunk_size=chunk_size):
185 if chunk:
186 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
187 connection.send(chunk)
188 total_bytes_sent += len(chunk)
189 chunk_count += 1
190 time.sleep(0.1) # Small delay between chunks
191
192 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
193 print("Waiting for agent response...")
194
195 # Wait for processing
196 print("Waiting for processing to complete...")
197 start_time = time.time()
198 timeout = 30 # 30 second timeout
199
200 while not processing_complete and (time.time() - start_time) < timeout:
201 time.sleep(1)
202 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
203
204 if not processing_complete:
205 print("Processing timed out after 30 seconds")
206 else:
207 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
208
209 # Cleanup
210 connection.finish()
211 print("Finished")
212
213 except Exception as e:
214 print(f"Error: {str(e)}")
215
216# WAV Header Functions
217def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
218 """Create a WAV header with the specified parameters"""
219 byte_rate = sample_rate * channels * (bits_per_sample // 8)
220 block_align = channels * (bits_per_sample // 8)
221
222 header = bytearray(44)
223 # RIFF header
224 header[0:4] = b'RIFF'
225 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
226 header[8:12] = b'WAVE'
227 # fmt chunk
228 header[12:16] = b'fmt '
229 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
230 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
231 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
232 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
233 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
234 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
235 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
236 # data chunk
237 header[36:40] = b'data'
238 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
239
240 return header
241
242if __name__ == "__main__":
243 main()

Implementation Examples

To better understand how to build a more complex Voice Agent, check out the following examples for working code.

Use CaseRuntime / LanguageRepo
Voice agent basic demoNode, TypeScript, JavaScriptDeepgram Voice Agent Demo
Voice agent medical assistant demoNode, TypeScript, JavaScriptDeepgram Voice Agent Medical Assistant Demo
Voice agent demo with TwilioPythonPython Twilio > Voice Agent Demo
Voice agent demo with text inputNode, TypeScript, JavaScriptDeepgram Conversational AI Demo
Voice agent with Azure Open AI ServicesPythonDeepgram Voice Agent with OpenAI Azure
Voice agent with Function Calling using Python FlaskPython / FlaskPython Flask Agent Function Calling Demo

Rate Limits

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Usage Tracking

Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.

Built with