Getting Started

An introduction to using Deepgram's Voice Agent API to build interactive voice agents.

In this guide, you’ll learn how to create a very basic voice agent using Deepgram’s Agent API. Visit the API Reference for more details on how to use the Agent API.

Build a Basic Voice Agent

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

1. Set up your environment

In the steps below, you’ll use the Terminal to:

  1. Create a new directory for your project
  2. Create a new file for your code
  3. Export your Deepgram API key to the environment so you can use it in your code
  4. Run any additional commands to initialize your project
$mkdir deepgram-agent-demo
$cd deepgram-agent-demo
$touch index.py
$export DEEPGRAM_API_KEY="your_Deepgram_API_key_here"

2. Install the Deepgram SDK

Deepgram has several SDKs that can make it easier to build a Voice Agent. Follow these steps below to use one of our SDKs to make your first Deepgram Voice Agent request.

In your terminal, navigate to the location on your drive where you created your project above, and install the Deepgram SDK and any other dependencies.

$pip install deepgram-sdk

3. Import dependencies and set up the main function

Next, import the necessary dependencies and set up your main application function.

1# For more Python SDK migration guides, visit:
2# https://github.com/deepgram/deepgram-python-sdk/tree/main/docs
3
4# Import dependencies and set up the main function
5import requests
6import wave
7import io
8import time
9import os
10import json
11import threading
12from datetime import datetime
13
14from deepgram import DeepgramClient
15from deepgram.core.events import EventType
16from deepgram.agent.v1.types import (
17 AgentV1Settings,
18 AgentV1SettingsAgent,
19 AgentV1SettingsAudio,
20 AgentV1SettingsAudioInput,
21 AgentV1SettingsAudioOutput,
22 AgentV1SettingsAgentListen,
23 AgentV1SettingsAgentListenProvider_V1,
24)
25from deepgram.types.think_settings_v1 import ThinkSettingsV1
26from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi
27from deepgram.types.speak_settings_v1 import SpeakSettingsV1
28from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram

4. Initialize the Voice Agent

Now you can initialize the voice agent by creating an empty audio buffer to store incoming audio data, setting up a counter for output file naming, and defining a sample audio file URL. You can then establish a connection to Deepgram and set up a welcome handler to log when the connection is successfully established.

1# For more Python SDK migration guides, visit:
2# https://github.com/deepgram/deepgram-python-sdk/tree/main/docs
3
4def main():
5 try:
6 # Initialize the Voice Agent
7 api_key = os.getenv("DEEPGRAM_API_KEY")
8 if not api_key:
9 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
10 print("API Key found")
11
12 # Initialize Deepgram client
13 client = DeepgramClient(api_key=api_key)
14
15 # Use connection as a context manager
16 with client.agent.v1.connect() as connection:
17 print("Created WebSocket connection...")
18
19 # The code in the following steps will go here

5. Configure the Agent

Next you will need to set up a very simplified version of the Settings message to configure your Agent’s behavior and set the required settings options for your Agent.

To learn more about all settings options available for an Agent, refer to the Configure the Voice Agent documentation.

1 # Configure the Agent
2 settings = AgentV1Settings(
3 audio=AgentV1SettingsAudio(
4 input=AgentV1SettingsAudioInput(
5 encoding="linear16",
6 sample_rate=24000,
7 ),
8 output=AgentV1SettingsAudioOutput(
9 encoding="linear16",
10 sample_rate=24000,
11 container="wav",
12 ),
13 ),
14 agent=AgentV1SettingsAgent(
15 language="en",
16 listen=AgentV1SettingsAgentListen(
17 provider=AgentV1SettingsAgentListenProvider_V1(
18 type="deepgram",
19 model="nova-3",
20 )
21 ),
22 think=ThinkSettingsV1(
23 provider=ThinkSettingsV1Provider_OpenAi(
24 type="open_ai",
25 model="gpt-4o-mini",
26 ),
27 prompt="You are a friendly AI assistant.",
28 ),
29 speak=SpeakSettingsV1(
30 provider=SpeakSettingsV1Provider_Deepgram(
31 type="deepgram",
32 model="aura-2-thalia-en",
33 )
34 ),
35 greeting="Hello! How can I help you today?",
36 ),
37 )

6. Send Keep Alive messages

Next you will send a keep-alive signal every 5 seconds to maintain the WebSocket connection. This prevents the connection from timing out during long audio processing. You will also fetch an audio file from the specified URL spacewalk.wav and stream the audio data in chunks to the Agent. Each chunk is sent as it becomes available in the readable stream.

1 # Send settings to configure the agent
2 print("Sending settings configuration...")
3 connection.send_settings(settings)
4 print("Settings sent successfully")

7. Setup Event Handlers and Other Functions

Next you will use this code to set up event handlers for the voice agent to manage the entire conversation lifecycle, from connection opening to closing. It handles audio processing by collecting chunks into a buffer and saving completed responses as WAV files, while also managing interruptions, logging conversations, and handling errors.

1 # Setup Event Handlers
2 audio_buffer = bytearray()
3 file_counter = 0
4 processing_complete = False
5
6 def on_open(event):
7 print("Connection opened")
8
9 def on_message(message):
10 nonlocal audio_buffer, file_counter, processing_complete
11
12 # Handle binary audio data
13 if isinstance(message, bytes):
14 audio_buffer.extend(message)
15 print(f"Received audio data: {len(message)} bytes")
16 return
17
18 # Handle different message types
19 msg_type = getattr(message, "type", "Unknown")
20 print(f"Received {msg_type} event")
21
22 # Handle specific event types
23 if msg_type == "Welcome":
24 print(f"Welcome: {message}")
25 with open("chatlog.txt", 'a') as chatlog:
26 chatlog.write(f"Welcome: {message}\n")
27
28 elif msg_type == "SettingsApplied":
29 print(f"Settings applied: {message}")
30 with open("chatlog.txt", 'a') as chatlog:
31 chatlog.write(f"Settings applied: {message}\n")
32
33 elif msg_type == "ConversationText":
34 print(f"Conversation: {message}")
35 with open("chatlog.txt", 'a') as chatlog:
36 chatlog.write(f"{json.dumps(message.__dict__)}\n")
37
38 elif msg_type == "UserStartedSpeaking":
39 print(f"User started speaking")
40 with open("chatlog.txt", 'a') as chatlog:
41 chatlog.write(f"User started speaking\n")
42
43 elif msg_type == "AgentThinking":
44 print(f"Agent thinking")
45 with open("chatlog.txt", 'a') as chatlog:
46 chatlog.write(f"Agent thinking\n")
47
48 elif msg_type == "AgentStartedSpeaking":
49 audio_buffer = bytearray() # Reset buffer for new response
50 print(f"Agent started speaking")
51 with open("chatlog.txt", 'a') as chatlog:
52 chatlog.write(f"Agent started speaking\n")
53
54 elif msg_type == "AgentAudioDone":
55 print(f"Agent audio done")
56 if len(audio_buffer) > 0:
57 with open(f"output-{file_counter}.wav", 'wb') as f:
58 f.write(create_wav_header())
59 f.write(audio_buffer)
60 print(f"Created output-{file_counter}.wav")
61 audio_buffer = bytearray()
62 file_counter += 1
63 processing_complete = True
64
65 def on_error(error):
66 print(f"Error: {error}")
67 with open("chatlog.txt", 'a') as chatlog:
68 chatlog.write(f"Error: {error}\n")
69
70 def on_close(event):
71 print(f"Connection closed")
72 with open("chatlog.txt", 'a') as chatlog:
73 chatlog.write(f"Connection closed\n")
74
75 # Register event handlers
76 connection.on(EventType.OPEN, on_open)
77 connection.on(EventType.MESSAGE, on_message)
78 connection.on(EventType.ERROR, on_error)
79 connection.on(EventType.CLOSE, on_close)
80 print("Event handlers registered")
81
82 # Send settings to configure the agent
83 print("Sending settings configuration...")
84 connection.send_settings(settings)
85 print("Settings sent successfully")
86
87 # Start listening for events in a background thread
88 print("Starting event listener...")
89 listener_thread = threading.Thread(target=connection.start_listening, daemon=True)
90 listener_thread.start()
91
92 # Wait a moment for connection to establish
93 time.sleep(1)
94
95 # Stream audio
96 print("Downloading and sending audio...")
97 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
98 # Skip WAV header
99 header = response.raw.read(44)
100
101 # Verify WAV header
102 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
103 print("Invalid WAV header")
104 return
105
106 chunk_size = 8192
107 total_bytes_sent = 0
108 chunk_count = 0
109 for chunk in response.iter_content(chunk_size=chunk_size):
110 if chunk:
111 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
112 connection.send_media(chunk)
113 total_bytes_sent += len(chunk)
114 chunk_count += 1
115 time.sleep(0.1) # Small delay between chunks
116
117 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
118 print("Waiting for agent response...")
119
120 # Wait for processing
121 print("Waiting for processing to complete...")
122 start_time = time.time()
123 timeout = 30 # 30 second timeout
124
125 while not processing_complete and (time.time() - start_time) < timeout:
126 time.sleep(1)
127 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
128
129 if not processing_complete:
130 print("Processing timed out after 30 seconds")
131 else:
132 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
133
134 print("Finished")
135
136 except Exception as e:
137 print(f"Error: {str(e)}")
138
139# WAV Header Functions
140def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
141 """Create a WAV header with the specified parameters"""
142 byte_rate = sample_rate * channels * (bits_per_sample // 8)
143 block_align = channels * (bits_per_sample // 8)
144
145 header = bytearray(44)
146 # RIFF header
147 header[0:4] = b'RIFF'
148 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
149 header[8:12] = b'WAVE'
150 # fmt chunk
151 header[12:16] = b'fmt '
152 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
153 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
154 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
155 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
156 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
157 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
158 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
159 # data chunk
160 header[36:40] = b'data'
161 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
162
163 return header
164
165if __name__ == "__main__":
166 main()

7. Run the Voice Agent

Now that you have your complete code, you can run the Voice Agent! If it works you should see the conversation text and audio in the files: output-0.wav and chatlog.txt. These files will be saved in the same directory as your main application file.

1python main.py

8. Putting it all together

Below is the final code for the Voice Agent you just built. If you saw any errors after running your Agent, you can compare the code below to the code you wrote in the steps above to find and fix the errors.

1# For more Python SDK migration guides, visit:
2# https://github.com/deepgram/deepgram-python-sdk/tree/main/docs
3
4# Copyright 2025 Deepgram SDK contributors. All Rights Reserved.
5# Use of this source code is governed by a MIT license that can be found in the LICENSE file.
6# SPDX-License-Identifier: MIT
7
8# Import dependencies and set up the main function
9import requests
10import time
11import os
12import json
13import threading
14
15from deepgram import DeepgramClient
16from deepgram.core.events import EventType
17from deepgram.agent.v1.types import (
18 AgentV1Settings,
19 AgentV1SettingsAgent,
20 AgentV1SettingsAudio,
21 AgentV1SettingsAudioInput,
22 AgentV1SettingsAudioOutput,
23 AgentV1SettingsAgentListen,
24 AgentV1SettingsAgentListenProvider_V1,
25)
26from deepgram.types.think_settings_v1 import ThinkSettingsV1
27from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi
28from deepgram.types.speak_settings_v1 import SpeakSettingsV1
29from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
30
31def main():
32 try:
33 # Initialize the Voice Agent
34 api_key = os.getenv("DEEPGRAM_API_KEY")
35 if not api_key:
36 raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
37 print("API Key found")
38
39 # Initialize Deepgram client
40 client = DeepgramClient(api_key=api_key)
41
42 # Use connection as a context manager
43 with client.agent.v1.connect() as connection:
44 print("Created WebSocket connection...")
45
46 # Configure the Agent
47 settings = AgentV1Settings(
48 audio=AgentV1SettingsAudio(
49 input=AgentV1SettingsAudioInput(
50 encoding="linear16",
51 sample_rate=24000,
52 ),
53 output=AgentV1SettingsAudioOutput(
54 encoding="linear16",
55 sample_rate=24000,
56 container="wav",
57 ),
58 ),
59 agent=AgentV1SettingsAgent(
60 language="en",
61 listen=AgentV1SettingsAgentListen(
62 provider=AgentV1SettingsAgentListenProvider_V1(
63 type="deepgram",
64 model="nova-3",
65 )
66 ),
67 think=ThinkSettingsV1(
68 provider=ThinkSettingsV1Provider_OpenAi(
69 type="open_ai",
70 model="gpt-4o-mini",
71 ),
72 prompt="You are a friendly AI assistant.",
73 ),
74 speak=SpeakSettingsV1(
75 provider=SpeakSettingsV1Provider_Deepgram(
76 type="deepgram",
77 model="aura-2-thalia-en",
78 )
79 ),
80 greeting="Hello! How can I help you today?",
81 ),
82 )
83
84 # Setup Event Handlers
85 audio_buffer = bytearray()
86 file_counter = 0
87 processing_complete = False
88
89 def on_open(event):
90 print("Connection opened")
91
92 def on_message(message):
93 nonlocal audio_buffer, file_counter, processing_complete
94
95 # Handle binary audio data
96 if isinstance(message, bytes):
97 audio_buffer.extend(message)
98 print(f"Received audio data: {len(message)} bytes")
99 return
100
101 # Handle different message types
102 msg_type = getattr(message, "type", "Unknown")
103 print(f"Received {msg_type} event")
104
105 # Handle specific event types
106 if msg_type == "Welcome":
107 print(f"Welcome: {message}")
108 with open("chatlog.txt", 'a') as chatlog:
109 chatlog.write(f"Welcome: {message}\n")
110
111 elif msg_type == "SettingsApplied":
112 print(f"Settings applied: {message}")
113 with open("chatlog.txt", 'a') as chatlog:
114 chatlog.write(f"Settings applied: {message}\n")
115
116 elif msg_type == "ConversationText":
117 print(f"Conversation: {message}")
118 with open("chatlog.txt", 'a') as chatlog:
119 chatlog.write(f"{json.dumps(message.__dict__)}\n")
120
121 elif msg_type == "UserStartedSpeaking":
122 print(f"User started speaking")
123 with open("chatlog.txt", 'a') as chatlog:
124 chatlog.write(f"User started speaking\n")
125
126 elif msg_type == "AgentThinking":
127 print(f"Agent thinking")
128 with open("chatlog.txt", 'a') as chatlog:
129 chatlog.write(f"Agent thinking\n")
130
131 elif msg_type == "AgentStartedSpeaking":
132 audio_buffer = bytearray() # Reset buffer for new response
133 print(f"Agent started speaking")
134 with open("chatlog.txt", 'a') as chatlog:
135 chatlog.write(f"Agent started speaking\n")
136
137 elif msg_type == "AgentAudioDone":
138 print(f"Agent audio done")
139 if len(audio_buffer) > 0:
140 with open(f"output-{file_counter}.wav", 'wb') as f:
141 f.write(create_wav_header())
142 f.write(audio_buffer)
143 print(f"Created output-{file_counter}.wav")
144 audio_buffer = bytearray()
145 file_counter += 1
146 processing_complete = True
147
148 def on_error(error):
149 print(f"Error: {error}")
150 with open("chatlog.txt", 'a') as chatlog:
151 chatlog.write(f"Error: {error}\n")
152
153 def on_close(event):
154 print(f"Connection closed")
155 with open("chatlog.txt", 'a') as chatlog:
156 chatlog.write(f"Connection closed\n")
157
158 # Register event handlers
159 connection.on(EventType.OPEN, on_open)
160 connection.on(EventType.MESSAGE, on_message)
161 connection.on(EventType.ERROR, on_error)
162 connection.on(EventType.CLOSE, on_close)
163 print("Event handlers registered")
164
165 # Send settings to configure the agent
166 print("Sending settings configuration...")
167 connection.send_settings(settings)
168 print("Settings sent successfully")
169
170 # Start listening for events in a background thread
171 print("Starting event listener...")
172 listener_thread = threading.Thread(target=connection.start_listening, daemon=True)
173 listener_thread.start()
174
175 # Wait a moment for connection to establish
176 time.sleep(1)
177
178 # Stream audio
179 print("Downloading and sending audio...")
180 response = requests.get("https://dpgr.am/spacewalk.wav", stream=True)
181 # Skip WAV header
182 header = response.raw.read(44)
183
184 # Verify WAV header
185 if header[0:4] != b'RIFF' or header[8:12] != b'WAVE':
186 print("Invalid WAV header")
187 return
188
189 chunk_size = 8192
190 total_bytes_sent = 0
191 chunk_count = 0
192 for chunk in response.iter_content(chunk_size=chunk_size):
193 if chunk:
194 print(f"Sending chunk {chunk_count}: {len(chunk)} bytes")
195 connection.send_media(chunk)
196 total_bytes_sent += len(chunk)
197 chunk_count += 1
198 time.sleep(0.1) # Small delay between chunks
199
200 print(f"Total audio data sent: {total_bytes_sent} bytes in {chunk_count} chunks")
201 print("Waiting for agent response...")
202
203 # Wait for processing
204 print("Waiting for processing to complete...")
205 start_time = time.time()
206 timeout = 30 # 30 second timeout
207
208 while not processing_complete and (time.time() - start_time) < timeout:
209 time.sleep(1)
210 print(f"Still waiting for agent response... ({int(time.time() - start_time)}s elapsed)")
211
212 if not processing_complete:
213 print("Processing timed out after 30 seconds")
214 else:
215 print("Processing complete. Check output-*.wav and chatlog.txt for results.")
216
217 print("Finished")
218
219 except Exception as e:
220 print(f"Error: {str(e)}")
221
222# WAV Header Functions
223def create_wav_header(sample_rate=24000, bits_per_sample=16, channels=1):
224 """Create a WAV header with the specified parameters"""
225 byte_rate = sample_rate * channels * (bits_per_sample // 8)
226 block_align = channels * (bits_per_sample // 8)
227
228 header = bytearray(44)
229 # RIFF header
230 header[0:4] = b'RIFF'
231 header[4:8] = b'\x00\x00\x00\x00' # File size (to be updated later)
232 header[8:12] = b'WAVE'
233 # fmt chunk
234 header[12:16] = b'fmt '
235 header[16:20] = b'\x10\x00\x00\x00' # Subchunk1Size (16 for PCM)
236 header[20:22] = b'\x01\x00' # AudioFormat (1 for PCM)
237 header[22:24] = channels.to_bytes(2, 'little') # NumChannels
238 header[24:28] = sample_rate.to_bytes(4, 'little') # SampleRate
239 header[28:32] = byte_rate.to_bytes(4, 'little') # ByteRate
240 header[32:34] = block_align.to_bytes(2, 'little') # BlockAlign
241 header[34:36] = bits_per_sample.to_bytes(2, 'little') # BitsPerSample
242 # data chunk
243 header[36:40] = b'data'
244 header[40:44] = b'\x00\x00\x00\x00' # Subchunk2Size (to be updated later)
245
246 return header
247
248if __name__ == "__main__":
249 main()

Implementation Examples

To better understand how to build a more complex Voice Agent, check out the following examples for working code.

Use CaseRuntime / LanguageRepo
Voice agent basic demoNode, TypeScript, JavaScriptDeepgram Voice Agent Demo
Voice agent medical assistant demoNode, TypeScript, JavaScriptDeepgram Voice Agent Medical Assistant Demo
Voice agent demo with TwilioPythonPython Twilio > Voice Agent Demo
Voice agent demo with text inputNode, TypeScript, JavaScriptDeepgram Conversational AI Demo
Voice agent with Azure Open AI ServicesPythonDeepgram Voice Agent with OpenAI Azure
Voice agent with Function Calling using Python FlaskPython / FlaskPython Flask Agent Function Calling Demo

Rate Limits

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Usage Tracking

Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.