Getting Started

An introduction to using Deepgram's Voice Agent API to build interactive voice agents.

In this guide, you’ll learn how to create a very basic voice agent using Deepgram’s Agent API. Visit the API Reference for more details on how to use the Agent API.

You will need to migrate to the new Voice Agent API V1 to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.

Build a Basic Voice Agent

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

1. Set up your environment

In the steps below, you’ll use the Terminal to:

  1. Create a new directory for your project
  2. Create a new file for your code
  3. Export your Deepgram API key to the environment so you can use it in your code
  4. Run any additional commands to initialize your project
$mkdir deepgram-agent-demo
>cd deepgram-agent-demo
>touch index.py
>export DEEPGRAM_API_KEY="your_Deepgram_API_key_here"

2. Install the Deepgram SDK

Deepgram has several SDKs that can make it easier to build a Voice Agent. Follow these steps below to use one of our SDKs to make your first Deepgram Voice Agent request.

In your terminal, navigate to the location on your drive where you created your project above, and install the Deepgram SDK and any other dependencies.

$pip install deepgram-sdk

3. Import dependencies and set up the main function

Next, import the necessary dependencies and set up your main application function.

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4# Import dependencies and set up the main function
5import requests
6import wave
7import io
8import time
9import os
10import json
11import threading
12from datetime import datetime
13
14from deepgram import (
15 DeepgramClient,
16 AgentWebSocketEvents,
17 AgentKeepAlive,
18)
19from deepgram.clients.agent.v1.websocket.options import SettingsOptions

4. Initialize the Voice Agent

Now you can initialize the voice agent by creating an empty audio buffer to store incoming audio data, setting up a counter for output file naming, and defining a sample audio file URL. You can then establish a connection to Deepgram and set up a welcome handler to log when the connection is successfully established.

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4def main():
5 try:
6 # Initialize the Voice Agent
7 api_key = os.getenv("DEEPGRAM_API_KEY")
8 if not api_key:
9 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
10 print(f"API Key found:")
11
12 # Initialize Deepgram client
13 deepgram = DeepgramClient(
14 api_key=api_key,
15 config={
16 "keepalive": "true",
17 },
18 )
19 connection = deepgram.agent.v1.connect()
20 print("Created WebSocket connection...")
21
22 # The code in the following steps will go here

5. Configure the Agent

Next you will need to set up a very simplified version of the Settings message to configure your Agent’s behavior and set the required settings options for your Agent.

To learn more about all settings options available for an Agent, refer to the Configure the Voice Agent documentation.

1 # Configure the Agent
2 options = SettingsOptions()
3 # Audio input configuration
4 options.audio.input.encoding = "linear16"
5 options.audio.input.sample_rate = 24000
6 # Audio output configuration
7 options.audio.output.encoding = "linear16"
8 options.audio.output.sample_rate = 24000
9 options.audio.output.container = "wav"
10 # Agent configuration
11 options.agent.language = "en"
12 options.agent.listen.provider.type = "deepgram"
13 options.agent.listen.provider.model = "nova-3"
14 options.agent.think.provider.type = "open_ai"
15 options.agent.think.provider.model = "gpt-4o-mini"
16 options.agent.think.prompt = "You are a friendly AI assistant."
17 options.agent.speak.provider.type = "deepgram"
18 options.agent.speak.provider.model = "aura-2-thalia-en"
19 options.agent.greeting = "Hello! How can I help you today?"

6. Send Keep Alive messages

Next you will send a keep-alive signal every 5 seconds to maintain the WebSocket connection. This prevents the connection from timing out during long audio processing. You will also fetch an audio file from the specified URL spacewalk.wav and stream the audio data in chunks to the Agent. Each chunk is sent as it becomes available in the readable stream.

1 # Send Keep Alive messages
2 def send_keep_alive():
3 while True:
4 time.sleep(5)
5 print("Keep alive!")
6 connection.send(str(AgentKeepAlive()))
7
8 # Start keep-alive in a separate thread
9 keep_alive_thread = threading.Thread(target=send_keep_alive, daemon=True)
10 keep_alive_thread.start()

7. Setup Event Handlers and Other Functions

Next you will use this code to set up event handlers for the voice agent to manage the entire conversation lifecycle, from connection opening to closing. It handles audio processing by collecting chunks into a buffer and saving completed responses as WAV files, while also managing interruptions, logging conversations, and handling errors.

1 # Setup Event Handlers
2 audio_buffer = bytearray()
3 file_counter = 0
4 processing_complete = False
5
6 def on_audio_data(self, data, **kwargs):
7 nonlocal audio_buffer
8 audio_buffer.extend(data)
9 print(f"Received audio data from agent: {len(data)} bytes")
10 print(f"Total buffer size: {len(audio_buffer)} bytes")
11 print(f"Audio data format: {data[:16].hex()}...")
12
13 def on_agent_audio_done(self, agent_audio_done, **kwargs):
14 nonlocal audio_buffer, file_counter, processing_complete
15 print(f"AgentAudioDone event received")
16 print(f"Buffer size at completion: {len(audio_buffer)} bytes")
17 print(f"Agent audio done: {agent_audio_done}")
18 if len(audio_buffer) > 0:
19 with open(f"output-{file_counter}.wav", 'wb') as f:
20 f.write(create_wav_header())
21 f.write(audio_buffer)
22 print(f"Created output-{file_counter}.wav")
23 audio_buffer = bytearray()
24 file_counter += 1
25 processing_complete = True
26
27 def on_conversation_text(self, conversation_text, **kwargs):
28 print(f"Conversation Text: {conversation_text}")
29 with open("chatlog.txt", 'a') as chatlog:
30 chatlog.write(f"{json.dumps(conversation_text.__dict__)}\n")
31
32 def on_welcome(self, welcome, **kwargs):
33 print(f"Welcome message received: {welcome}")
34 with open("chatlog.txt", 'a') as chatlog:
35 chatlog.write(f"Welcome message: {welcome}\n")
36
37 def on_settings_applied(self, settings_applied, **kwargs):
38 print(f"Settings applied: {settings_applied}")
39 with open("chatlog.txt", 'a') as chatlog:
40 chatlog.write(f"Settings applied: {settings_applied}\n")
41
42 def on_user_started_speaking(self, user_started_speaking, **kwargs):
43 print(f"User Started Speaking: {user_started_speaking}")
44 with open("chatlog.txt", 'a') as chatlog:
45 chatlog.write(f"User Started Speaking: {user_started_speaking}\n")
46
47 def on_agent_thinking(self, agent_thinking, **kwargs):
48 print(f"Agent Thinking: {agent_thinking}")
49 with open("chatlog.txt", 'a') as chatlog:
50 chatlog.write(f"Agent Thinking: {agent_thinking}\n")
51
52 def on_agent_started_speaking(self, agent_started_speaking, **kwargs):
53 nonlocal audio_buffer
54 audio_buffer = bytearray() # Reset buffer for new response
55 print(f"Agent Started Speaking: {agent_started_speaking}")
56 with open("chatlog.txt", 'a') as chatlog:
57 chatlog.write(f"Agent Started Speaking: {agent_started_speaking}\n")
58
59 def on_close(self, close, **kwargs):
60 print(f"Connection closed: {close}")
61 with open("chatlog.txt", 'a') as chatlog:
62 chatlog.write(f"Connection closed: {close}\n")
63
64 def on_error(self, error, **kwargs):
65 print(f"Error: {error}")
66 with open("chatlog.txt", 'a') as chatlog:
67 chatlog.write(f"Error: {error}\n")
68
69 def on_unhandled(self, unhandled, **kwargs):
70 print(f"Unhandled event: {unhandled}")
71 with open("chatlog.txt", 'a') as chatlog:
72 chatlog.write(f"Unhandled event: {unhandled}\n")
73
74 # Register handlers
75 connection.on(AgentWebSocketEvents.AudioData, on_audio_data)
76 connection.on(AgentWebSocketEvents.AgentAudioDone, on_agent_audio_done)
77 connection.on(AgentWebSocketEvents.ConversationText, on_conversation_text)
78 connection.on(AgentWebSocketEvents.Welcome, on_welcome)
79 connection.on(AgentWebSocketEvents.SettingsApplied, on_settings_applied)
80 connection.on(AgentWebSocketEvents.UserStartedSpeaking, on_user_started_speaking)
81 connection.on(AgentWebSocketEvents.AgentThinking, on_agent_thinking)
82 connection.on(AgentWebSocketEvents.AgentStartedSpeaking, on_agent_started_speaking)
83 connection.on(AgentWebSocketEvents.Close, on_close)
84 connection.on(AgentWebSocketEvents.Error, on_error)
85 connection.on(AgentWebSocketEvents.Unhandled, on_unhandled)
86 print("Event handlers registered")
87
88 # Start the connection
89 print("Starting WebSocket connection...")
90 if not connection.start(options):
91 print("Failed to start connection")
92 return
93 print("WebSocket connection started successfully")
94
95 # Stream audio
96 print("Downloading and sending audio...")
97 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
98 # Skip WAV header
99 header = response.raw.read(44)
100
101 # Verify WAV header
102 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
103 print("Invalid WAV header")
104 return
105
106 # Extract sample rate from header
107 sample_rate = int.from_bytes(header[24:28], 'little')
108
109 chunk_size = 8192
110 total_bytes_sent = 0
111 chunk_count = 0
112 for chunk in response.iter_content(chunk_size=chunk_size):
113 if chunk:
114 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
115 connection.send(chunk)
116 total_bytes_sent += len(chunk)
117 chunk_count += 1
118 time.sleep(0.1) # Small delay between chunks
119
120 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
121 print("Waiting for agent response...")
122
123 # Wait for processing
124 print("Waiting for processing to complete...")
125 start_time = time.time()
126 timeout = 30 # 30 second timeout
127
128 while not processing_complete and (time.time() - start_time) < timeout:
129 time.sleep(1)
130 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
131
132 if not processing_complete:
133 print("Processing timed out after 30 seconds")
134 else:
135 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
136
137 # Cleanup
138 connection.finish()
139 print("Finished")
140
141 except Exception as e:
142 print(f"Error: {str(e)}")
143
144# WAV Header Functions
145def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
146 """Create a WAV header with the specified parameters"""
147 byte_rate = sample_rate * channels * (bits_per_sample // 8)
148 block_align = channels * (bits_per_sample // 8)
149
150 header = bytearray(44)
151 # RIFF header
152 header[0:4] = b'RIFF'
153 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
154 header[8:12] = b'WAVE'
155 # fmt chunk
156 header[12:16] = b'fmt '
157 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
158 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
159 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
160 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
161 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
162 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
163 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
164 # data chunk
165 header[36:40] = b'data'
166 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
167
168 return header
169
170if __name__ == "__main__":
171 main()

7. Run the Voice Agent

Now that you have your complete code, you can run the Voice Agent! If it works you should see the conversation text and audio in the files: output-0.wav and chatlog.txt. These files will be saved in the same directory as your main application file.

1python main.py

8. Putting it all together

Below is the final code for the Voice Agent you just built. If you saw any errors after running your Agent, you can compare the code below to the code you wrote in the steps above to find and fix the errors.

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4# Copyright 2025 Deepgram SDK contributors. All Rights Reserved.
5# Use of this source code is governed by a MIT license that can be found in the LICENSE file.
6# SPDX-License-Identifier: MIT
7
8# Import dependencies and set up the main function
9import requests
10import wave
11import io
12import time
13import os
14import json
15import threading
16from datetime import datetime
17
18from deepgram import (
19 DeepgramClient,
20 AgentWebSocketEvents,
21 AgentKeepAlive,
22)
23from deepgram.clients.agent.v1.websocket.options import SettingsOptions
24
25def main():
26 try:
27 # Initialize the Voice Agent
28 api_key = os.getenv("DEEPGRAM_API_KEY")
29 if not api_key:
30 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
31 print(f"API Key found:")
32
33 # Initialize Deepgram client
34 deepgram = DeepgramClient(
35 api_key=api_key,
36 config={
37 "keepalive": "true",
38 # "speaker_playback": "true",
39 },
40 )
41 connection = deepgram.agent.v1.connect()
42 print("Created WebSocket connection...")
43
44 # 4. Configure the Agent
45 options = SettingsOptions()
46 # Audio input configuration
47 options.audio.input.encoding = "linear16"
48 options.audio.input.sample_rate = 24000
49 # Audio output configuration
50 options.audio.output.encoding = "linear16"
51 options.audio.output.sample_rate = 24000
52 options.audio.output.container = "wav"
53 # Agent configuration
54 options.agent.language = "en"
55 options.agent.listen.provider.type = "deepgram"
56 options.agent.listen.provider.model = "nova-3"
57 options.agent.think.provider.type = "open_ai"
58 options.agent.think.provider.model = "gpt-4o-mini"
59 options.agent.think.prompt = "You are a friendly AI assistant."
60 options.agent.speak.provider.type = "deepgram"
61 options.agent.speak.provider.model = "aura-2-thalia-en"
62 options.agent.greeting = "Hello! How can I help you today?"
63
64 # Send Keep Alive messages
65 def send_keep_alive():
66 while True:
67 time.sleep(5)
68 print("Keep alive!")
69 connection.send(str(AgentKeepAlive()))
70
71 # Start keep-alive in a separate thread
72 keep_alive_thread = threading.Thread(target=send_keep_alive, daemon=True)
73 keep_alive_thread.start()
74
75 # Setup Event Handlers
76 audio_buffer = bytearray()
77 file_counter = 0
78 processing_complete = False
79
80 def on_audio_data(self, data, **kwargs):
81 nonlocal audio_buffer
82 audio_buffer.extend(data)
83 print(f"Received audio data from agent: {len(data)} bytes")
84 print(f"Total buffer size: {len(audio_buffer)} bytes")
85 print(f"Audio data format: {data[:16].hex()}...")
86
87 def on_agent_audio_done(self, agent_audio_done, **kwargs):
88 nonlocal audio_buffer, file_counter, processing_complete
89 print(f"AgentAudioDone event received")
90 print(f"Buffer size at completion: {len(audio_buffer)} bytes")
91 print(f"Agent audio done: {agent_audio_done}")
92 if len(audio_buffer) > 0:
93 with open(f"output-{file_counter}.wav", 'wb') as f:
94 f.write(create_wav_header())
95 f.write(audio_buffer)
96 print(f"Created output-{file_counter}.wav")
97 audio_buffer = bytearray()
98 file_counter += 1
99 processing_complete = True
100
101 def on_conversation_text(self, conversation_text, **kwargs):
102 print(f"Conversation Text: {conversation_text}")
103 with open("chatlog.txt", 'a') as chatlog:
104 chatlog.write(f"{json.dumps(conversation_text.__dict__)}\n")
105
106 def on_welcome(self, welcome, **kwargs):
107 print(f"Welcome message received: {welcome}")
108 with open("chatlog.txt", 'a') as chatlog:
109 chatlog.write(f"Welcome message: {welcome}\n")
110
111 def on_settings_applied(self, settings_applied, **kwargs):
112 print(f"Settings applied: {settings_applied}")
113 with open("chatlog.txt", 'a') as chatlog:
114 chatlog.write(f"Settings applied: {settings_applied}\n")
115
116 def on_user_started_speaking(self, user_started_speaking, **kwargs):
117 print(f"User Started Speaking: {user_started_speaking}")
118 with open("chatlog.txt", 'a') as chatlog:
119 chatlog.write(f"User Started Speaking: {user_started_speaking}\n")
120
121 def on_agent_thinking(self, agent_thinking, **kwargs):
122 print(f"Agent Thinking: {agent_thinking}")
123 with open("chatlog.txt", 'a') as chatlog:
124 chatlog.write(f"Agent Thinking: {agent_thinking}\n")
125
126 def on_agent_started_speaking(self, agent_started_speaking, **kwargs):
127 nonlocal audio_buffer
128 audio_buffer = bytearray() # Reset buffer for new response
129 print(f"Agent Started Speaking: {agent_started_speaking}")
130 with open("chatlog.txt", 'a') as chatlog:
131 chatlog.write(f"Agent Started Speaking: {agent_started_speaking}\n")
132
133 def on_close(self, close, **kwargs):
134 print(f"Connection closed: {close}")
135 with open("chatlog.txt", 'a') as chatlog:
136 chatlog.write(f"Connection closed: {close}\n")
137
138 def on_error(self, error, **kwargs):
139 print(f"Error: {error}")
140 with open("chatlog.txt", 'a') as chatlog:
141 chatlog.write(f"Error: {error}\n")
142
143 def on_unhandled(self, unhandled, **kwargs):
144 print(f"Unhandled event: {unhandled}")
145 with open("chatlog.txt", 'a') as chatlog:
146 chatlog.write(f"Unhandled event: {unhandled}\n")
147
148 # Register handlers
149 connection.on(AgentWebSocketEvents.AudioData, on_audio_data)
150 connection.on(AgentWebSocketEvents.AgentAudioDone, on_agent_audio_done)
151 connection.on(AgentWebSocketEvents.ConversationText, on_conversation_text)
152 connection.on(AgentWebSocketEvents.Welcome, on_welcome)
153 connection.on(AgentWebSocketEvents.SettingsApplied, on_settings_applied)
154 connection.on(AgentWebSocketEvents.UserStartedSpeaking, on_user_started_speaking)
155 connection.on(AgentWebSocketEvents.AgentThinking, on_agent_thinking)
156 connection.on(AgentWebSocketEvents.AgentStartedSpeaking, on_agent_started_speaking)
157 connection.on(AgentWebSocketEvents.Close, on_close)
158 connection.on(AgentWebSocketEvents.Error, on_error)
159 connection.on(AgentWebSocketEvents.Unhandled, on_unhandled)
160 print("Event handlers registered")
161
162 # Start the connection
163 print("Starting WebSocket connection...")
164 if not connection.start(options):
165 print("Failed to start connection")
166 return
167 print("WebSocket connection started successfully")
168
169 # Stream audio
170 print("Downloading and sending audio...")
171 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
172 # Skip WAV header
173 header = response.raw.read(44)
174
175 # Verify WAV header
176 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
177 print("Invalid WAV header")
178 return
179
180 # Extract sample rate from header
181 sample_rate = int.from_bytes(header[24:28], 'little')
182
183 chunk_size = 8192
184 total_bytes_sent = 0
185 chunk_count = 0
186 for chunk in response.iter_content(chunk_size=chunk_size):
187 if chunk:
188 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
189 connection.send(chunk)
190 total_bytes_sent += len(chunk)
191 chunk_count += 1
192 time.sleep(0.1) # Small delay between chunks
193
194 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
195 print("Waiting for agent response...")
196
197 # Wait for processing
198 print("Waiting for processing to complete...")
199 start_time = time.time()
200 timeout = 30 # 30 second timeout
201
202 while not processing_complete and (time.time() - start_time) < timeout:
203 time.sleep(1)
204 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
205
206 if not processing_complete:
207 print("Processing timed out after 30 seconds")
208 else:
209 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
210
211 # Cleanup
212 connection.finish()
213 print("Finished")
214
215 except Exception as e:
216 print(f"Error: {str(e)}")
217
218# WAV Header Functions
219def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
220 """Create a WAV header with the specified parameters"""
221 byte_rate = sample_rate * channels * (bits_per_sample // 8)
222 block_align = channels * (bits_per_sample // 8)
223
224 header = bytearray(44)
225 # RIFF header
226 header[0:4] = b'RIFF'
227 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
228 header[8:12] = b'WAVE'
229 # fmt chunk
230 header[12:16] = b'fmt '
231 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
232 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
233 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
234 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
235 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
236 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
237 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
238 # data chunk
239 header[36:40] = b'data'
240 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
241
242 return header
243
244if __name__ == "__main__":
245 main()

Implementation Examples

To better understand how to build a more complex Voice Agent, check out the following examples for working code.

Use CaseRuntime / LanguageRepo
Voice agent basic demoNode, TypeScript, JavaScriptDeepgram Voice Agent Demo
Voice agent medical assistant demoNode, TypeScript, JavaScriptDeepgram Voice Agent Medical Assistant Demo
Voice agent demo with TwilioPythonPython Twilio > Voice Agent Demo
Voice agent demo with text inputNode, TypeScript, JavaScriptDeepgram Conversational AI Demo
Voice agent with Azure Open AI ServicesPythonDeepgram Voice Agent with OpenAI Azure
Voice agent with Function Calling using Python FlaskPython / FlaskPython Flask Agent Function Calling Demo

Rate Limits

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Usage Tracking

Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.