Getting Started

An introduction to using Deepgram's Voice Agent API to build interactive voice agents.

In this guide, you’ll learn how to create a very basic voice agent using Deepgram’s Agent API. Visit the API Reference for more details on how to use the Agent API.

You will need to migrate to the new Voice Agent API V1 to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.

Build a Basic Voice Agent

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

1. Set up your environment

In the steps below, you’ll use the Terminal to:

  1. Create a new directory for your project
  2. Create a new file for your code
  3. Export your Deepgram API key to the environment so you can use it in your code
  4. Run any additional commands to initialize your project
$mkdir deepgram-agent-demo
>cd deepgram-agent-demo
>touch index.py
>export DEEPGRAM_API_KEY="your_Deepgram_API_key_here"

2. Install the Deepgram SDK

Deepgram has several SDKs that can make it easier to build a Voice Agent. Follow these steps below to use one of our SDKs to make your first Deepgram Voice Agent request.

In your terminal, navigate to the location on your drive where you created your project above, and install the Deepgram SDK and any other dependencies.

$pip install deepgram-sdk

3. Import dependencies and set up the main function

Next, import the necessary dependencies and set up your main application function.

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4# Import dependencies and set up the main function
5import requests
6import wave
7import io
8import time
9import os
10import json
11import threading
12from datetime import datetime
13
14from deepgram import DeepgramClient
15from deepgram.core.events import EventType
16from deepgram.extensions.types.sockets import (
17 AgentV1Agent,
18 AgentV1AudioConfig,
19 AgentV1AudioInput,
20 AgentV1AudioOutput,
21 AgentV1DeepgramSpeakProvider,
22 AgentV1Listen,
23 AgentV1ListenProvider,
24 AgentV1OpenAiThinkProvider,
25 AgentV1SettingsMessage,
26 AgentV1SocketClientResponse,
27 AgentV1SpeakProviderConfig,
28 AgentV1Think,
29)

4. Initialize the Voice Agent

Now you can initialize the voice agent by creating an empty audio buffer to store incoming audio data, setting up a counter for output file naming, and defining a sample audio file URL. You can then establish a connection to Deepgram and set up a welcome handler to log when the connection is successfully established.

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4def main():
5 try:
6 # Initialize the Voice Agent
7 api_key = os.getenv("DEEPGRAM_API_KEY")
8 if not api_key:
9 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
10 print("API Key found")
11
12 # Initialize Deepgram client
13 client = DeepgramClient(api_key=api_key)
14
15 # Use connection as a context manager
16 with client.agent.v1.connect() as connection:
17 print("Created WebSocket connection...")
18
19 # The code in the following steps will go here

5. Configure the Agent

Next you will need to set up a very simplified version of the Settings message to configure your Agent’s behavior and set the required settings options for your Agent.

To learn more about all settings options available for an Agent, refer to the Configure the Voice Agent documentation.

1 # Configure the Agent
2 settings = AgentV1SettingsMessage(
3 audio=AgentV1AudioConfig(
4 input=AgentV1AudioInput(
5 encoding="linear16",
6 sample_rate=24000,
7 ),
8 output=AgentV1AudioOutput(
9 encoding="linear16",
10 sample_rate=24000,
11 container="wav",
12 ),
13 ),
14 agent=AgentV1Agent(
15 language="en",
16 listen=AgentV1Listen(
17 provider=AgentV1ListenProvider(
18 type="deepgram",
19 model="nova-3",
20 )
21 ),
22 think=AgentV1Think(
23 provider=AgentV1OpenAiThinkProvider(
24 type="open_ai",
25 model="gpt-4o-mini",
26 ),
27 prompt="You are a friendly AI assistant.",
28 ),
29 speak=AgentV1SpeakProviderConfig(
30 provider=AgentV1DeepgramSpeakProvider(
31 type="deepgram",
32 model="aura-2-thalia-en",
33 )
34 ),
35 greeting="Hello! How can I help you today?",
36 ),
37 )

6. Send Keep Alive messages

Next you will send a keep-alive signal every 5 seconds to maintain the WebSocket connection. This prevents the connection from timing out during long audio processing. You will also fetch an audio file from the specified URL spacewalk.wav and stream the audio data in chunks to the Agent. Each chunk is sent as it becomes available in the readable stream.

1 # Send settings to configure the agent
2 print("Sending settings configuration...")
3 connection.send_settings(settings)
4 print("Settings sent successfully")

7. Setup Event Handlers and Other Functions

Next you will use this code to set up event handlers for the voice agent to manage the entire conversation lifecycle, from connection opening to closing. It handles audio processing by collecting chunks into a buffer and saving completed responses as WAV files, while also managing interruptions, logging conversations, and handling errors.

1 # Setup Event Handlers
2 audio_buffer = bytearray()
3 file_counter = 0
4 processing_complete = False
5
6 def on_open(event):
7 print("Connection opened")
8
9 def on_message(message: AgentV1SocketClientResponse):
10 nonlocal audio_buffer, file_counter, processing_complete
11
12 # Handle binary audio data
13 if isinstance(message, bytes):
14 audio_buffer.extend(message)
15 print(f"Received audio data: {len(message)} bytes")
16 return
17
18 # Handle different message types
19 msg_type = getattr(message, "type", "Unknown")
20 print(f"Received {msg_type} event")
21
22 # Handle specific event types
23 if msg_type == "Welcome":
24 print(f"Welcome: {message}")
25 with open("chatlog.txt", 'a') as chatlog:
26 chatlog.write(f"Welcome: {message}\n")
27
28 elif msg_type == "SettingsApplied":
29 print(f"Settings applied: {message}")
30 with open("chatlog.txt", 'a') as chatlog:
31 chatlog.write(f"Settings applied: {message}\n")
32
33 elif msg_type == "ConversationText":
34 print(f"Conversation: {message}")
35 with open("chatlog.txt", 'a') as chatlog:
36 chatlog.write(f"{json.dumps(message.__dict__)}\n")
37
38 elif msg_type == "UserStartedSpeaking":
39 print(f"User started speaking")
40 with open("chatlog.txt", 'a') as chatlog:
41 chatlog.write(f"User started speaking\n")
42
43 elif msg_type == "AgentThinking":
44 print(f"Agent thinking")
45 with open("chatlog.txt", 'a') as chatlog:
46 chatlog.write(f"Agent thinking\n")
47
48 elif msg_type == "AgentStartedSpeaking":
49 audio_buffer = bytearray() # Reset buffer for new response
50 print(f"Agent started speaking")
51 with open("chatlog.txt", 'a') as chatlog:
52 chatlog.write(f"Agent started speaking\n")
53
54 elif msg_type == "AgentAudioDone":
55 print(f"Agent audio done")
56 if len(audio_buffer) > 0:
57 with open(f"output-{file_counter}.wav", 'wb') as f:
58 f.write(create_wav_header())
59 f.write(audio_buffer)
60 print(f"Created output-{file_counter}.wav")
61 audio_buffer = bytearray()
62 file_counter += 1
63 processing_complete = True
64
65 def on_error(error):
66 print(f"Error: {error}")
67 with open("chatlog.txt", 'a') as chatlog:
68 chatlog.write(f"Error: {error}\n")
69
70 def on_close(event):
71 print(f"Connection closed")
72 with open("chatlog.txt", 'a') as chatlog:
73 chatlog.write(f"Connection closed\n")
74
75 # Register event handlers
76 connection.on(EventType.OPEN, on_open)
77 connection.on(EventType.MESSAGE, on_message)
78 connection.on(EventType.ERROR, on_error)
79 connection.on(EventType.CLOSE, on_close)
80 print("Event handlers registered")
81
82 # Send settings to configure the agent
83 print("Sending settings configuration...")
84 connection.send_settings(settings)
85 print("Settings sent successfully")
86
87 # Start listening for events in a background thread
88 print("Starting event listener...")
89 listener_thread = threading.Thread(target=connection.start_listening, daemon=True)
90 listener_thread.start()
91
92 # Wait a moment for connection to establish
93 time.sleep(1)
94
95 # Stream audio
96 print("Downloading and sending audio...")
97 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
98 # Skip WAV header
99 header = response.raw.read(44)
100
101 # Verify WAV header
102 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
103 print("Invalid WAV header")
104 return
105
106 chunk_size = 8192
107 total_bytes_sent = 0
108 chunk_count = 0
109 for chunk in response.iter_content(chunk_size=chunk_size):
110 if chunk:
111 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
112 connection.send_media(chunk)
113 total_bytes_sent += len(chunk)
114 chunk_count += 1
115 time.sleep(0.1) # Small delay between chunks
116
117 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
118 print("Waiting for agent response...")
119
120 # Wait for processing
121 print("Waiting for processing to complete...")
122 start_time = time.time()
123 timeout = 30 # 30 second timeout
124
125 while not processing_complete and (time.time() - start_time) < timeout:
126 time.sleep(1)
127 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
128
129 if not processing_complete:
130 print("Processing timed out after 30 seconds")
131 else:
132 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
133
134 print("Finished")
135
136 except Exception as e:
137 print(f"Error: {str(e)}")
138
139# WAV Header Functions
140def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
141 """Create a WAV header with the specified parameters"""
142 byte_rate = sample_rate * channels * (bits_per_sample // 8)
143 block_align = channels * (bits_per_sample // 8)
144
145 header = bytearray(44)
146 # RIFF header
147 header[0:4] = b'RIFF'
148 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
149 header[8:12] = b'WAVE'
150 # fmt chunk
151 header[12:16] = b'fmt '
152 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
153 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
154 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
155 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
156 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
157 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
158 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
159 # data chunk
160 header[36:40] = b'data'
161 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
162
163 return header
164
165if __name__ == "__main__":
166 main()

7. Run the Voice Agent

Now that you have your complete code, you can run the Voice Agent! If it works you should see the conversation text and audio in the files: output-0.wav and chatlog.txt. These files will be saved in the same directory as your main application file.

1python main.py

8. Putting it all together

Below is the final code for the Voice Agent you just built. If you saw any errors after running your Agent, you can compare the code below to the code you wrote in the steps above to find and fix the errors.

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4# Copyright 2025 Deepgram SDK contributors. All Rights Reserved.
5# Use of this source code is governed by a MIT license that can be found in the LICENSE file.
6# SPDX-License-Identifier: MIT
7
8# Import dependencies and set up the main function
9import requests
10import time
11import os
12import json
13import threading
14
15from deepgram import DeepgramClient
16from deepgram.core.events import EventType
17from deepgram.extensions.types.sockets import (
18 AgentV1Agent,
19 AgentV1AudioConfig,
20 AgentV1AudioInput,
21 AgentV1AudioOutput,
22 AgentV1DeepgramSpeakProvider,
23 AgentV1Listen,
24 AgentV1ListenProvider,
25 AgentV1OpenAiThinkProvider,
26 AgentV1SettingsMessage,
27 AgentV1SocketClientResponse,
28 AgentV1SpeakProviderConfig,
29 AgentV1Think,
30)
31
32def main():
33 try:
34 # Initialize the Voice Agent
35 api_key = os.getenv("DEEPGRAM_API_KEY")
36 if not api_key:
37 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
38 print("API Key found")
39
40 # Initialize Deepgram client
41 client = DeepgramClient(api_key=api_key)
42
43 # Use connection as a context manager
44 with client.agent.v1.connect() as connection:
45 print("Created WebSocket connection...")
46
47 # Configure the Agent
48 settings = AgentV1SettingsMessage(
49 audio=AgentV1AudioConfig(
50 input=AgentV1AudioInput(
51 encoding="linear16",
52 sample_rate=24000,
53 ),
54 output=AgentV1AudioOutput(
55 encoding="linear16",
56 sample_rate=24000,
57 container="wav",
58 ),
59 ),
60 agent=AgentV1Agent(
61 language="en",
62 listen=AgentV1Listen(
63 provider=AgentV1ListenProvider(
64 type="deepgram",
65 model="nova-3",
66 )
67 ),
68 think=AgentV1Think(
69 provider=AgentV1OpenAiThinkProvider(
70 type="open_ai",
71 model="gpt-4o-mini",
72 ),
73 prompt="You are a friendly AI assistant.",
74 ),
75 speak=AgentV1SpeakProviderConfig(
76 provider=AgentV1DeepgramSpeakProvider(
77 type="deepgram",
78 model="aura-2-thalia-en",
79 )
80 ),
81 greeting="Hello! How can I help you today?",
82 ),
83 )
84
85 # Setup Event Handlers
86 audio_buffer = bytearray()
87 file_counter = 0
88 processing_complete = False
89
90 def on_open(event):
91 print("Connection opened")
92
93 def on_message(message: AgentV1SocketClientResponse):
94 nonlocal audio_buffer, file_counter, processing_complete
95
96 # Handle binary audio data
97 if isinstance(message, bytes):
98 audio_buffer.extend(message)
99 print(f"Received audio data: {len(message)} bytes")
100 return
101
102 # Handle different message types
103 msg_type = getattr(message, "type", "Unknown")
104 print(f"Received {msg_type} event")
105
106 # Handle specific event types
107 if msg_type == "Welcome":
108 print(f"Welcome: {message}")
109 with open("chatlog.txt", 'a') as chatlog:
110 chatlog.write(f"Welcome: {message}\n")
111
112 elif msg_type == "SettingsApplied":
113 print(f"Settings applied: {message}")
114 with open("chatlog.txt", 'a') as chatlog:
115 chatlog.write(f"Settings applied: {message}\n")
116
117 elif msg_type == "ConversationText":
118 print(f"Conversation: {message}")
119 with open("chatlog.txt", 'a') as chatlog:
120 chatlog.write(f"{json.dumps(message.__dict__)}\n")
121
122 elif msg_type == "UserStartedSpeaking":
123 print(f"User started speaking")
124 with open("chatlog.txt", 'a') as chatlog:
125 chatlog.write(f"User started speaking\n")
126
127 elif msg_type == "AgentThinking":
128 print(f"Agent thinking")
129 with open("chatlog.txt", 'a') as chatlog:
130 chatlog.write(f"Agent thinking\n")
131
132 elif msg_type == "AgentStartedSpeaking":
133 audio_buffer = bytearray() # Reset buffer for new response
134 print(f"Agent started speaking")
135 with open("chatlog.txt", 'a') as chatlog:
136 chatlog.write(f"Agent started speaking\n")
137
138 elif msg_type == "AgentAudioDone":
139 print(f"Agent audio done")
140 if len(audio_buffer) > 0:
141 with open(f"output-{file_counter}.wav", 'wb') as f:
142 f.write(create_wav_header())
143 f.write(audio_buffer)
144 print(f"Created output-{file_counter}.wav")
145 audio_buffer = bytearray()
146 file_counter += 1
147 processing_complete = True
148
149 def on_error(error):
150 print(f"Error: {error}")
151 with open("chatlog.txt", 'a') as chatlog:
152 chatlog.write(f"Error: {error}\n")
153
154 def on_close(event):
155 print(f"Connection closed")
156 with open("chatlog.txt", 'a') as chatlog:
157 chatlog.write(f"Connection closed\n")
158
159 # Register event handlers
160 connection.on(EventType.OPEN, on_open)
161 connection.on(EventType.MESSAGE, on_message)
162 connection.on(EventType.ERROR, on_error)
163 connection.on(EventType.CLOSE, on_close)
164 print("Event handlers registered")
165
166 # Send settings to configure the agent
167 print("Sending settings configuration...")
168 connection.send_settings(settings)
169 print("Settings sent successfully")
170
171 # Start listening for events in a background thread
172 print("Starting event listener...")
173 listener_thread = threading.Thread(target=connection.start_listening, daemon=True)
174 listener_thread.start()
175
176 # Wait a moment for connection to establish
177 time.sleep(1)
178
179 # Stream audio
180 print("Downloading and sending audio...")
181 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
182 # Skip WAV header
183 header = response.raw.read(44)
184
185 # Verify WAV header
186 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
187 print("Invalid WAV header")
188 return
189
190 chunk_size = 8192
191 total_bytes_sent = 0
192 chunk_count = 0
193 for chunk in response.iter_content(chunk_size=chunk_size):
194 if chunk:
195 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
196 connection.send_media(chunk)
197 total_bytes_sent += len(chunk)
198 chunk_count += 1
199 time.sleep(0.1) # Small delay between chunks
200
201 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
202 print("Waiting for agent response...")
203
204 # Wait for processing
205 print("Waiting for processing to complete...")
206 start_time = time.time()
207 timeout = 30 # 30 second timeout
208
209 while not processing_complete and (time.time() - start_time) < timeout:
210 time.sleep(1)
211 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
212
213 if not processing_complete:
214 print("Processing timed out after 30 seconds")
215 else:
216 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
217
218 print("Finished")
219
220 except Exception as e:
221 print(f"Error: {str(e)}")
222
223# WAV Header Functions
224def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
225 """Create a WAV header with the specified parameters"""
226 byte_rate = sample_rate * channels * (bits_per_sample // 8)
227 block_align = channels * (bits_per_sample // 8)
228
229 header = bytearray(44)
230 # RIFF header
231 header[0:4] = b'RIFF'
232 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
233 header[8:12] = b'WAVE'
234 # fmt chunk
235 header[12:16] = b'fmt '
236 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
237 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
238 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
239 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
240 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
241 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
242 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
243 # data chunk
244 header[36:40] = b'data'
245 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
246
247 return header
248
249if __name__ == "__main__":
250 main()

Implementation Examples

To better understand how to build a more complex Voice Agent, check out the following examples for working code.

Use CaseRuntime / LanguageRepo
Voice agent basic demoNode, TypeScript, JavaScriptDeepgram Voice Agent Demo
Voice agent medical assistant demoNode, TypeScript, JavaScriptDeepgram Voice Agent Medical Assistant Demo
Voice agent demo with TwilioPythonPython Twilio > Voice Agent Demo
Voice agent demo with text inputNode, TypeScript, JavaScriptDeepgram Conversational AI Demo
Voice agent with Azure Open AI ServicesPythonDeepgram Voice Agent with OpenAI Azure
Voice agent with Function Calling using Python FlaskPython / FlaskPython Flask Agent Function Calling Demo

Rate Limits

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Usage Tracking

Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.