Getting Started
An introduction to using Deepgram’s Voice Agent API to build interactive voice agents.
In this guide, you’ll learn how to develop applications using Deepgram’s Agent API. Visit the API Specification for more details on how to interact with this interface, view control messages available, and obtain examples for responses from Deepgram.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
API Playground
First, quickly explore Deepgram’s Voice Agent in our API Playground.
Try this feature out in our API Playground!
Interact with Deepgram’s Voice Agent API
Follow these steps to connect to Deepgram’s Voice Agent API and enable real-time, interactive voice interactions:
- Open a websocket connection to
wss://agent.deepgram.com/agent
. - Include the HTTP header
Authorization: Token YOUR_DEEPGRAM_API_KEY
. - Send a SettingsConfiguration message over the websocket with your desired settings.
- Start streaming audio from your device, microphone, or audio source of your choice. Audio should be sent as binary messages over the websocket.
- Play the audio stream that the server sends back, which will be binary messages containing the agent’s speech.
- Exchange real-time messages with the server, including Server Events (e.g.,
ConversationText
when the agent hears speech,UserStartedSpeaking
when the user begins speaking) and Client Messages (e.g.,UpdateInstructions
to modify behavior,InjectAgentMessage
to trigger an immediate response).
FYI
- Whenever you receive a UserStartedSpeaking message from the server, we recommend you discard any queued agent speech. This is important because a
UserStartedSpeaking
message from the server indicates that the user has started speaking and you should cancel or ignore any responses that the bot or agent has queued up but hasn’t yet spoken. - The system is designed to allow “barge-in” functionality, where the user can interrupt the bot while it’s speaking. By discarding the queued response, you ensure that the bot immediately stops talking and starts listening, creating a more natural, interactive experience for the user.
Implementation Examples
SDK Code Examples
Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram Voice Agent request.
Install the SDK
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
The current versions of the SDK referenced below are considered a BETA release.
Make the Request with the SDK
Non-SDK Code Examples
If you would like to try out making a Deepgram Voice Agent request in a specific language (but not using Deepgram’s SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs.
Rate Limits
For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.
Usage Tracking
Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.
Pricing
- Pay-as-you-go pricing for the standard tier of the voice agent api is $4.50 per hour
- If you bring your own LLM, standard tier pricing is $3.90 per hour.
- The time calculation we use is the amount of time from when you open a web socket to when the socket closes.
- We’ll bill per millisecond of usage.
What’s Next?
- Check out the API Specification for more details on the Voice Agent API.
- Review the Agent API Feature Overview
- Review this audio format guide to determine the format of your audio.
- Learn how to measure streaming latency in real-time streaming of audio.