Getting Started

An introduction to using Deepgram’s Voice Agent API to build interactive voice agents.

Voice Agent

In this guide, you’ll learn how to develop applications using Deepgram’s Agent API. Visit the API Specification for more details on how to interact with this interface, view control messages available, and obtain examples for responses from Deepgram.

Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.

API Playground

First, quickly explore Deepgram’s Voice Agent in our API Playground.

Deepgram API Playground Try this feature out in our API Playground!

Interact with Deepgram’s Voice Agent API

Follow these steps to connect to Deepgram’s Voice Agent API and enable real-time, interactive voice interactions:

  1. Open a websocket connection to wss://agent.deepgram.com/agent.
  2. Include the HTTP header Authorization: Token YOUR_DEEPGRAM_API_KEY.
  3. Send a SettingsConfiguration message over the websocket with your desired settings.
  4. Start streaming audio from your device, microphone, or audio source of your choice. Audio should be sent as binary messages over the websocket.
  5. Play the audio stream that the server sends back, which will be binary messages containing the agent’s speech.
  6. Exchange real-time messages with the server, including Server Events (e.g., ConversationText when the agent hears speech, UserStartedSpeaking when the user begins speaking) and Client Messages (e.g., UpdateInstructions to modify behavior, InjectAgentMessage to trigger an immediate response).

FYI

  1. Whenever you receive a UserStartedSpeaking message from the server, we recommend you discard any queued agent speech. This is important because a UserStartedSpeaking message from the server indicates that the user has started speaking and you should cancel or ignore any responses that the bot or agent has queued up but hasn’t yet spoken.
  2. The system is designed to allow “barge-in” functionality, where the user can interrupt the bot while it’s speaking. By discarding the queued response, you ensure that the bot immediately stops talking and starts listening, creating a more natural, interactive experience for the user.

Implementation Examples

Use CaseRuntime / LanguageRepo
End to End demo of a working Voice Agent using Deepgram’s Voice Agent APINode, TypeScript, JavaScriptDeepgram Voice Agent Demo
Combines Text-to-Speech and Speech-to-Text into a conversational AI agentNode, TypeScript, JavaScriptDeepgram Conversational AI Demo
A basic example of using Deepgram’s Voice Agent with Azure Open AI ServicesPythonDeepgram Voice Agent with OpenAI Azure
A demo of Amazon Bedrock & Deepgram Voice Agent APIJavaScript / HTMLDeepgram Voice Agent with Amazon Bedrock
A demo of a Simple Python Flask Voice Agent using Function CallingPython / FlaskPython Flask Agent Function Calling Demo
A demo using Twilio with a Deepgram Voice AgentPythonPython Twilio > Voice Agent Demo

SDK Code Examples

Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram Voice Agent request.

Install the SDK

Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.

The current versions of the SDK referenced below are considered a BETA release.

$pip install deepgram-sdk

Make the Request with the SDK

1from deepgram.utils import verboselogs
2
3from deepgram import (
4 DeepgramClient,
5 DeepgramClientOptions,
6 AgentWebSocketEvents,
7 SettingsConfigurationOptions,
8)
9
10def main():
11 try:
12 # example of setting up a client config. logging values: WARNING, VERBOSE, DEBUG, SPAM
13 config: DeepgramClientOptions = DeepgramClientOptions(
14 options={
15 "keepalive": "true",
16 "microphone_record": "true",
17 "speaker_playback": "true",
18 },
19 # verbose=verboselogs.DEBUG,
20 )
21 deepgram: DeepgramClient = DeepgramClient("", config)
22
23 # Create a websocket connection to Deepgram
24 dg_connection = deepgram.agent.websocket.v("1")
25
26 def on_open(self, open, **kwargs):
27 print(f"\n\n{open}\n\n")
28
29 def on_binary_data(self, data, **kwargs):
30 global warning_notice
31 if warning_notice:
32 print("Received binary data")
33 print("You can do something with the binary data here")
34 print("OR")
35 print(
36 "If you want to simply play the audio, set speaker_playback to true in the options for DeepgramClientOptions"
37 )
38 warning_notice = False
39
40 def on_welcome(self, welcome, **kwargs):
41 print(f"\n\n{welcome}\n\n")
42
43 def on_settings_applied(self, settings_applied, **kwargs):
44 print(f"\n\n{settings_applied}\n\n")
45
46 def on_conversation_text(self, conversation_text, **kwargs):
47 print(f"\n\n{conversation_text}\n\n")
48
49 def on_user_started_speaking(self, user_started_speaking, **kwargs):
50 print(f"\n\n{user_started_speaking}\n\n")
51
52 def on_agent_thinking(self, agent_thinking, **kwargs):
53 print(f"\n\n{agent_thinking}\n\n")
54
55 def on_function_calling(self, function_calling, **kwargs):
56 print(f"\n\n{function_calling}\n\n")
57
58 def on_agent_started_speaking(self, agent_started_speaking, **kwargs):
59 print(f"\n\n{agent_started_speaking}\n\n")
60
61 def on_agent_audio_done(self, agent_audio_done, **kwargs):
62 print(f"\n\n{agent_audio_done}\n\n")
63
64 def on_close(self, close, **kwargs):
65 print(f"\n\n{close}\n\n")
66
67 def on_error(self, error, **kwargs):
68 print(f"\n\n{error}\n\n")
69
70 def on_unhandled(self, unhandled, **kwargs):
71 print(f"\n\n{unhandled}\n\n")
72
73 dg_connection.on(AgentWebSocketEvents.Open, on_open)
74 dg_connection.on(AgentWebSocketEvents.AudioData, on_binary_data)
75 dg_connection.on(AgentWebSocketEvents.Welcome, on_welcome)
76 dg_connection.on(AgentWebSocketEvents.SettingsApplied, on_settings_applied)
77 dg_connection.on(AgentWebSocketEvents.ConversationText, on_conversation_text)
78 dg_connection.on(
79 AgentWebSocketEvents.UserStartedSpeaking, on_user_started_speaking
80 )
81 dg_connection.on(AgentWebSocketEvents.AgentThinking, on_agent_thinking)
82 dg_connection.on(AgentWebSocketEvents.FunctionCalling, on_function_calling)
83 dg_connection.on(
84 AgentWebSocketEvents.AgentStartedSpeaking, on_agent_started_speaking
85 )
86 dg_connection.on(AgentWebSocketEvents.AgentAudioDone, on_agent_audio_done)
87 dg_connection.on(AgentWebSocketEvents.Close, on_close)
88 dg_connection.on(AgentWebSocketEvents.Error, on_error)
89 dg_connection.on(AgentWebSocketEvents.Unhandled, on_unhandled)
90
91 # connect to websocket
92 options: SettingsConfigurationOptions = SettingsConfigurationOptions()
93 options.agent.think.provider.type = "open_ai"
94 options.agent.think.model = "gpt-4o-mini"
95 options.agent.think.instructions = "You are a helpful AI assistant."
96
97 if dg_connection.start(options) is False:
98 print("Failed to start connection")
99 return
100
101 print("\n\nPress Enter to stop...\n\n")
102 input()
103
104 # Close the connection
105 dg_connection.finish()
106
107 print("Finished")
108
109 except ValueError as e:
110 print(f"Invalid value encountered: {e}")
111 except Exception as e:
112 print(f"An unexpected error occurred: {e}")
113
114if __name__ == "__main__":
115 main()

Non-SDK Code Examples

If you would like to try out making a Deepgram Voice Agent request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs.

Rate Limits

For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.

Usage Tracking

Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.

Pricing

  • Pay-as-you-go pricing for the standard tier of the voice agent api is $4.50 per hour
  • If you bring your own LLM, standard tier pricing is $3.90 per hour.
  • The time calculation we use is the amount of time from when you open a web socket to when the socket closes.
  • We’ll bill per millisecond of usage.

What’s Next?

Built with