Getting Started
An introduction to using Deepgram’s Voice Agent API to build interactive voice agents.
In this guide, you’ll learn how to create a very basic voice agent using Deepgram’s Agent API. Visit the API Reference for more details on how to use the Agent API.
You will need to migrate to the new Voice Agent API V1 to continue to use the Voice Agent API. Please refer to the Voice Agent API Migration Guide for more information.
Build a Basic Voice Agent
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
1. Set up your environment
In the steps below, you’ll use the Terminal to:
- Create a new directory for your project
- Create a new file for your code
- Export your Deepgram API key to the environment so you can use it in your code
- Run any additional commands to initialize your project
2. Install the Deepgram SDK
Deepgram has several SDKs that can make it easier to build a Voice Agent. Follow these steps below to use one of our SDKs to make your first Deepgram Voice Agent request.
In your terminal, navigate to the location on your drive where you created your project above, and install the Deepgram SDK and any other dependencies.
3. Import dependencies and set up the main function
Next, import the necessary dependencies and set up your main application function.
4. Initialize the Voice Agent
Now you can initialize the voice agent by creating an empty audio buffer to store incoming audio data, setting up a counter for output file naming, and defining a sample audio file URL. You can then establish a connection to Deepgram and set up a welcome handler to log when the connection is successfully established.
5. Configure the Agent
Next you will need to set up a very simplified version of the Settings message to configure your Agent’s behavior and set the required settings options for your Agent.
To learn more about all settings options available for an Agent, refer to the Configure the Voice Agent documentation.
6. Send Keep Alive messages
Next you will send a keep-alive signal every 5 seconds to maintain the WebSocket connection. This prevents the connection from timing out during long audio processing. You will also fetch an audio file from the specified URL spacewalk.wav and stream the audio data in chunks to the Agent. Each chunk is sent as it becomes available in the readable stream.
7. Setup Event Handlers and Other Functions
Next you will use this code to set up event handlers for the voice agent to manage the entire conversation lifecycle, from connection opening to closing. It handles audio processing by collecting chunks into a buffer and saving completed responses as WAV files, while also managing interruptions, logging conversations, and handling errors.
7. Run the Voice Agent
Now that you have your complete code, you can run the Voice Agent! If it works you should see the conversation text and audio in the files: output-0.wav
and chatlog.txt
. These files will be saved in the same directory as your main application file.
8. Putting it all together
Below is the final code for the Voice Agent you just built. If you saw any errors after running your Agent, you can compare the code below to the code you wrote in the steps above to find and fix the errors.
Implementation Examples
To better understand how to build a more complex Voice Agent, check out the following examples for working code.
Rate Limits
For information on Deepgram’s Concurrency Rate Limits, refer to our API Rate Limits Documentation.
Usage Tracking
Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.