Getting Started
An introduction to using Deepgram's Voice Agent API to build interactive voice agents.
In this guide, you'll learn how to develop applications using Deepgram's Agent API. Visit the API Specification for more details on how to interact with this interface, view control messages available, and obtain examples for responses from Deepgram.
Before you start, you'll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
API Playground
First, quickly explore Deepgram's Voice Agent in our API Playground.
Try this feature out in our API Playground!
Transcribe Audio From an Audio Stream
Follow these steps to transcribe audio from an audio stream using one of Deepgram's SDKs.
- Open a websocket connection to
wss://agent.deepgram.com/agent
. - Include the HTTP header
Authorization: Token YOUR_DEEPGRAM_API_KEY
. - Send a SettingsConfiguration message over the websocket with your desired settings.
- Start streaming audio from your device, microphone, or audio source of your choice. Audio should be sent as binary messages over the websocket.
- Play the audio stream that the server sends back, which will be binary messages containing the agent's speech.
- Whenever you receive a UserStartedSpeaking message from the server we recommend you discard any queued agent speech. This is important because a
UserStartedSpeaking
message from the server indicates that the user has started speaking and you should cancel or ignore any responses that the bot or agent has queued up but hasn't yet spoken. - The system is designed to allow "barge-in" functionality, where the user can interrupt the bot while it's speaking. By discarding the queued response, you ensure that the bot immediately stops talking and starts listening, creating a more natural, interactive experience for the user.
Implementation Examples
Use Case | Runtime / Language | Repo |
---|---|---|
End to End demo of a working Voice Agent using Deepgram's Voice Agent API | Node, TypeScript, JavaScript | Deepgram Voice Agent Demo |
Combines Text-to-Speech and Speech-to-Text into a conversational AI agent | Node, TypeScript, JavaScript | Deepgram Conversational AI Demo |
A basic example of using Deepgram's Voice Agent with Azure Open AI Services | Python | Deepgram Voice Agent with OpenAI Azure |
A demo of Amazon Bedrock & Deepgram Voice Agent API | JavaScript / HTML | Deepgram Voice Agent with Amazon Bedrock |
SDK Code Examples
Deepgram has several SDKs that can make it easier to use the API. Follow these steps to use the SDK of your choice to make a Deepgram Voice Agent request.
Install the SDK
Open your terminal, navigate to the location on your drive where you want to create your project, and install the Deepgram SDK.
The current versions of the SDK referenced below are considered a BETA release.
# EA Beta Branch
# https://github.com/deepgram/deepgram-python-sdk/tree/sts-agent-api-ea
# currently only available via unstable builds
pip install deepgram-unstable-sdk==3.8.0.dev4
# EA Beta Branch
# https://github.com/deepgram/deepgram-js-sdk/releases/tag/v3.9.0-beta.1
# currently only available via unstable builds
npm install @deepgram/[email protected]
# EA Beta Branch
# https://github.com/deepgram/deepgram-dotnet-sdk/tree/sts-agent-api-ea
# currently only available via unstable builds
dotnet add package Deepgram.Unstable.SDK.Builds --version 4.5.0-dev.2
# If you are using the Microphone use:
dotnet add package Deepgram.Unstable.Microphone.Builds --version 4.5.0-dev.2
# EA Beta Branch
# https://github.com/deepgram/deepgram-go-sdk/tree/sts-agent-api-ea
# currently only available via unstable builds
go get github.com/deepgram/[email protected]
Make the Request with the SDK
from deepgram.utils import verboselogs
from deepgram import (
DeepgramClient,
DeepgramClientOptions,
AgentWebSocketEvents,
SettingsConfigurationOptions,
)
def main():
try:
# example of setting up a client config. logging values: WARNING, VERBOSE, DEBUG, SPAM
config: DeepgramClientOptions = DeepgramClientOptions(
options={
"keepalive": "true",
"microphone_record": "true",
"speaker_playback": "true",
},
# verbose=verboselogs.DEBUG,
)
deepgram: DeepgramClient = DeepgramClient("", config)
# Create a websocket connection to Deepgram
dg_connection = deepgram.agent.websocket.v("1")
def on_open(self, open, **kwargs):
print(f"\n\n{open}\n\n")
def on_binary_data(self, data, **kwargs):
global warning_notice
if warning_notice:
print("Received binary data")
print("You can do something with the binary data here")
print("OR")
print(
"If you want to simply play the audio, set speaker_playback to true in the options for DeepgramClientOptions"
)
warning_notice = False
def on_welcome(self, welcome, **kwargs):
print(f"\n\n{welcome}\n\n")
def on_settings_applied(self, settings_applied, **kwargs):
print(f"\n\n{settings_applied}\n\n")
def on_conversation_text(self, conversation_text, **kwargs):
print(f"\n\n{conversation_text}\n\n")
def on_user_started_speaking(self, user_started_speaking, **kwargs):
print(f"\n\n{user_started_speaking}\n\n")
def on_agent_thinking(self, agent_thinking, **kwargs):
print(f"\n\n{agent_thinking}\n\n")
def on_function_calling(self, function_calling, **kwargs):
print(f"\n\n{function_calling}\n\n")
def on_agent_started_speaking(self, agent_started_speaking, **kwargs):
print(f"\n\n{agent_started_speaking}\n\n")
def on_agent_audio_done(self, agent_audio_done, **kwargs):
print(f"\n\n{agent_audio_done}\n\n")
def on_close(self, close, **kwargs):
print(f"\n\n{close}\n\n")
def on_error(self, error, **kwargs):
print(f"\n\n{error}\n\n")
def on_unhandled(self, unhandled, **kwargs):
print(f"\n\n{unhandled}\n\n")
dg_connection.on(AgentWebSocketEvents.Open, on_open)
dg_connection.on(AgentWebSocketEvents.AudioData, on_binary_data)
dg_connection.on(AgentWebSocketEvents.Welcome, on_welcome)
dg_connection.on(AgentWebSocketEvents.SettingsApplied, on_settings_applied)
dg_connection.on(AgentWebSocketEvents.ConversationText, on_conversation_text)
dg_connection.on(
AgentWebSocketEvents.UserStartedSpeaking, on_user_started_speaking
)
dg_connection.on(AgentWebSocketEvents.AgentThinking, on_agent_thinking)
dg_connection.on(AgentWebSocketEvents.FunctionCalling, on_function_calling)
dg_connection.on(
AgentWebSocketEvents.AgentStartedSpeaking, on_agent_started_speaking
)
dg_connection.on(AgentWebSocketEvents.AgentAudioDone, on_agent_audio_done)
dg_connection.on(AgentWebSocketEvents.Close, on_close)
dg_connection.on(AgentWebSocketEvents.Error, on_error)
dg_connection.on(AgentWebSocketEvents.Unhandled, on_unhandled)
# connect to websocket
options: SettingsConfigurationOptions = SettingsConfigurationOptions()
options.agent.think.provider.type = "open_ai"
options.agent.think.model = "gpt-4o-mini"
options.agent.think.instructions = "You are a helpful AI assistant."
if dg_connection.start(options) is False:
print("Failed to start connection")
return
print("\n\nPress Enter to stop...\n\n")
input()
# Close the connection
dg_connection.finish()
print("Finished")
except ValueError as e:
print(f"Invalid value encountered: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
if __name__ == "__main__":
main()
const { writeFile, appendFile } = require("fs/promises");
const { createClient, AgentEvents } = require("../../dist/main/index");
const fetch = require("cross-fetch");
const { join } = require("path");
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
const agent = async () => {
let audioBuffer = Buffer.alloc(0);
let i = 0;
const url = "https://dpgr.am/spacewalk.wav";
const connection = deepgram.agent();
connection.on(AgentEvents.Open, async () => {
console.log("Connection opened");
await connection.configure({
audio: {
input: {
encoding: "linear16",
sampleRate: 44100,
},
output: {
encoding: "linear16",
sampleRate: 16000,
container: "wav",
},
},
agent: {
listen: {
model: "nova-2",
},
speak: {
model: "aura-asteria-en",
},
think: {
provider: {
type: "anthropic",
},
model: "claude-3-haiku-20240307",
},
},
});
console.log("Deepgram Agent configured.");
setInterval(() => {
console.log("Keep alive!");
void connection.keepAlive();
}, 5000);
fetch(url)
.then((r) => r.body)
.then((res) => {
res.on("readable", () => {
connection.send(res.read());
});
});
});
connection.on(AgentEvents.Close, () => {
console.log("Connection closed");
process.exit(0);
});
connection.on(AgentEvents.ConversationText, async (data) => {
await appendFile(join(__dirname, `chatlog.txt`), JSON.stringify(data) + "\n");
});
connection.on(AgentEvents.UserStartedSpeaking, () => {
if (audioBuffer.length) {
console.log("Interrupting agent.");
audioBuffer = Buffer.alloc(0);
}
});
connection.on(AgentEvents.Metadata, (data) => {
console.dir(data, { depth: null });
});
connection.on(AgentEvents.Audio, (data) => {
// Concatenate the audio chunks into a single buffer
const buffer = Buffer.from(data);
audioBuffer = Buffer.concat([audioBuffer, buffer]);
});
connection.on(AgentEvents.Error, (err) => {
console.error("Error!");
console.error(err);
console.error(err.message);
});
connection.on(AgentEvents.AgentAudioDone, async () => {
await writeFile(join(__dirname, `output-${i}.wav`), audioBuffer);
audioBuffer = Buffer.alloc(0);
i++;
});
connection.on(AgentEvents.Unhandled, (data) => {
console.dir(data, { depth: null });
});
};
void agent();
using Deepgram.Logger;
using Deepgram.Microphone;
using Deepgram.Models.Authenticate.v1;
using Deepgram.Models.Agent.v2.WebSocket;
namespace SampleApp
{
class Program
{
static async Task Main(string[] args)
{
try
{
// Initialize Library with default logging
// Normal logging is "Info" level
Deepgram.Library.Initialize();
// Initialize the microphone library
Deepgram.Microphone.Library.Initialize();
Console.WriteLine("\n\nPress any key to stop and exit...\n\n\n");
// Set "DEEPGRAM_API_KEY" environment variable to your Deepgram API Key
var agentClient = ClientFactory.CreateAgentWebSocketClient();
// current time
var lastAudioTime = DateTime.Now;
var audioFileCount = 0;
// Subscribe to the EventResponseReceived event
await agentClient.Subscribe(new EventHandler<OpenResponse>((sender, e) =>
{
Console.WriteLine($"----> {e.Type} received");
}));
await agentClient.Subscribe(new EventHandler<AudioResponse>((sender, e) =>
{
Console.WriteLine($"----> {e.Type} received");
// if the last audio response is more than 5 seconds ago, add a wav header
if (DateTime.Now.Subtract(lastAudioTime).TotalSeconds > 7)
{
audioFileCount = audioFileCount + 1; // increment the audio file count
// delete the file if it exists
if (File.Exists($"output_{audioFileCount}.wav"))
{
File.Delete($"output_{audioFileCount}.wav");
}
using (BinaryWriter writer = new BinaryWriter(File.Open($"output_{audioFileCount}.wav", FileMode.Append)))
{
Console.WriteLine("Adding WAV header to output.wav");
byte[] wavHeader = new byte[44];
int sampleRate = 48000;
short bitsPerSample = 16;
short channels = 1;
int byteRate = sampleRate * channels * (bitsPerSample / 8);
short blockAlign = (short)(channels * (bitsPerSample / 8));
wavHeader[0] = 0x52; // R
wavHeader[1] = 0x49; // I
wavHeader[2] = 0x46; // F
wavHeader[3] = 0x46; // F
wavHeader[4] = 0x00; // Placeholder for file size (will be updated later)
wavHeader[5] = 0x00; // Placeholder for file size (will be updated later)
wavHeader[6] = 0x00; // Placeholder for file size (will be updated later)
wavHeader[7] = 0x00; // Placeholder for file size (will be updated later)
wavHeader[8] = 0x57; // W
wavHeader[9] = 0x41; // A
wavHeader[10] = 0x56; // V
wavHeader[11] = 0x45; // E
wavHeader[12] = 0x66; // f
wavHeader[13] = 0x6D; // m
wavHeader[14] = 0x74; // t
wavHeader[15] = 0x20; // Space
wavHeader[16] = 0x10; // Subchunk1Size (16 for PCM)
wavHeader[17] = 0x00; // Subchunk1Size
wavHeader[18] = 0x00; // Subchunk1Size
wavHeader[19] = 0x00; // Subchunk1Size
wavHeader[20] = 0x01; // AudioFormat (1 for PCM)
wavHeader[21] = 0x00; // AudioFormat
wavHeader[22] = (byte)channels; // NumChannels
wavHeader[23] = 0x00; // NumChannels
wavHeader[24] = (byte)(sampleRate & 0xFF); // SampleRate
wavHeader[25] = (byte)((sampleRate >> 8) & 0xFF); // SampleRate
wavHeader[26] = (byte)((sampleRate >> 16) & 0xFF); // SampleRate
wavHeader[27] = (byte)((sampleRate >> 24) & 0xFF); // SampleRate
wavHeader[28] = (byte)(byteRate & 0xFF); // ByteRate
wavHeader[29] = (byte)((byteRate >> 8) & 0xFF); // ByteRate
wavHeader[30] = (byte)((byteRate >> 16) & 0xFF); // ByteRate
wavHeader[31] = (byte)((byteRate >> 24) & 0xFF); // ByteRate
wavHeader[32] = (byte)blockAlign; // BlockAlign
wavHeader[33] = 0x00; // BlockAlign
wavHeader[34] = (byte)bitsPerSample; // BitsPerSample
wavHeader[35] = 0x00; // BitsPerSample
wavHeader[36] = 0x64; // d
wavHeader[37] = 0x61; // a
wavHeader[38] = 0x74; // t
wavHeader[39] = 0x61; // a
wavHeader[40] = 0x00; // Placeholder for data chunk size (will be updated later)
wavHeader[41] = 0x00; // Placeholder for data chunk size (will be updated later)
wavHeader[42] = 0x00; // Placeholder for data chunk size (will be updated later)
wavHeader[43] = 0x00; // Placeholder for data chunk size (will be updated later)
writer.Write(wavHeader);
}
}
if (e.Stream != null)
{
using (BinaryWriter writer = new BinaryWriter(File.Open($"output_{audioFileCount}.wav", FileMode.Append)))
{
writer.Write(e.Stream.ToArray());
}
}
// record the last audio time
lastAudioTime = DateTime.Now;
}));
await agentClient.Subscribe(new EventHandler<AgentAudioDoneResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<AgentStartedSpeakingResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<AgentThinkingResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<ConversationTextResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<FunctionCallingResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<FunctionCallRequestResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<UserStartedSpeakingResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<WelcomeResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<CloseResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<UnhandledResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received");
}));
await agentClient.Subscribe(new EventHandler<ErrorResponse>((sender, e) =>
{
Console.WriteLine($"----> {e} received. Error: {e.Message}");
}));
// Start the connection
var settingsConfiguration = new SettingsConfigurationSchema();
settingsConfiguration.Agent.Think.Provider.Type = "open_ai";
settingsConfiguration.Agent.Think.Model = "gpt-4o-mini";
settingsConfiguration.Agent.Think.Instructions = "You are a helpful AI assistant.";
settingsConfiguration.Audio.Output.SampleRate = 48000;
settingsConfiguration.Audio.Input.SampleRate = 16000;
bool bConnected = await agentClient.Connect(settingsConfiguration);
if (!bConnected)
{
Console.WriteLine("Failed to connect to Deepgram WebSocket server.");
return;
}
// Microphone streaming
var microphone = new Microphone(agentClient.SendBinary);
microphone.Start();
// Wait for the user to press a key
Console.ReadKey();
// Stop the microphone
microphone.Stop();
// Stop the connection
await agentClient.Stop();
// Terminate Libraries
Deepgram.Microphone.Library.Terminate();
Deepgram.Library.Terminate();
}
catch (Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
}
}
}
}
// Copyright 2024 Deepgram SDK contributors. All Rights Reserved.
// Use of this source code is governed by a MIT license that can be found in the LICENSE file.
// SPDX-License-Identifier: MIT
package main
// streaming
import (
"bufio"
"context"
"fmt"
"os"
"sync"
"time"
msginterfaces "github.com/deepgram/deepgram-go-sdk/pkg/api/agent/v1/websocket/interfaces"
microphone "github.com/deepgram/deepgram-go-sdk/pkg/audio/microphone"
client "github.com/deepgram/deepgram-go-sdk/pkg/client/agent"
interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
)
type MyHandler struct {
binaryChan chan *[]byte
openChan chan *msginterfaces.OpenResponse
welcomeResponse chan *msginterfaces.WelcomeResponse
conversationTextResponse chan *msginterfaces.ConversationTextResponse
userStartedSpeakingResponse chan *msginterfaces.UserStartedSpeakingResponse
agentThinkingResponse chan *msginterfaces.AgentThinkingResponse
functionCallRequestResponse chan *msginterfaces.FunctionCallRequestResponse
functionCallingResponse chan *msginterfaces.FunctionCallingResponse
agentStartedSpeakingResponse chan *msginterfaces.AgentStartedSpeakingResponse
agentAudioDoneResponse chan *msginterfaces.AgentAudioDoneResponse
closeChan chan *msginterfaces.CloseResponse
errorChan chan *msginterfaces.ErrorResponse
unhandledChan chan *[]byte
}
func NewMyHandler() *MyHandler {
handler := &MyHandler{
binaryChan: make(chan *[]byte),
openChan: make(chan *msginterfaces.OpenResponse),
welcomeResponse: make(chan *msginterfaces.WelcomeResponse),
conversationTextResponse: make(chan *msginterfaces.ConversationTextResponse),
userStartedSpeakingResponse: make(chan *msginterfaces.UserStartedSpeakingResponse),
agentThinkingResponse: make(chan *msginterfaces.AgentThinkingResponse),
functionCallRequestResponse: make(chan *msginterfaces.FunctionCallRequestResponse),
functionCallingResponse: make(chan *msginterfaces.FunctionCallingResponse),
agentStartedSpeakingResponse: make(chan *msginterfaces.AgentStartedSpeakingResponse),
agentAudioDoneResponse: make(chan *msginterfaces.AgentAudioDoneResponse),
closeChan: make(chan *msginterfaces.CloseResponse),
errorChan: make(chan *msginterfaces.ErrorResponse),
unhandledChan: make(chan *[]byte),
}
go func() {
handler.Run()
}()
return handler
}
// GetBinary returns the binary channels
func (dch MyHandler) GetBinary() []*chan *[]byte {
return []*chan *[]byte{&dch.binaryChan}
}
// GetOpen returns the open channels
func (dch MyHandler) GetOpen() []*chan *msginterfaces.OpenResponse {
return []*chan *msginterfaces.OpenResponse{&dch.openChan}
}
// GetWelcomeResponse returns the welcome response channels
func (dch MyHandler) GetWelcome() []*chan *msginterfaces.WelcomeResponse {
return []*chan *msginterfaces.WelcomeResponse{&dch.welcomeResponse}
}
// GetConversationTextResponse returns the conversation text response channels
func (dch MyHandler) GetConversationText() []*chan *msginterfaces.ConversationTextResponse {
return []*chan *msginterfaces.ConversationTextResponse{&dch.conversationTextResponse}
}
// GetUserStartedSpeakingResponse returns the user started speaking response channels
func (dch MyHandler) GetUserStartedSpeaking() []*chan *msginterfaces.UserStartedSpeakingResponse {
return []*chan *msginterfaces.UserStartedSpeakingResponse{&dch.userStartedSpeakingResponse}
}
// GetAgentThinkingResponse returns the agent thinking response channels
func (dch MyHandler) GetAgentThinking() []*chan *msginterfaces.AgentThinkingResponse {
return []*chan *msginterfaces.AgentThinkingResponse{&dch.agentThinkingResponse}
}
// GetFunctionCallRequestResponse returns the function call request response channels
func (dch MyHandler) GetFunctionCallRequest() []*chan *msginterfaces.FunctionCallRequestResponse {
return []*chan *msginterfaces.FunctionCallRequestResponse{&dch.functionCallRequestResponse}
}
// GetFunctionCallingResponse returns the function calling response channels
func (dch MyHandler) GetFunctionCalling() []*chan *msginterfaces.FunctionCallingResponse {
return []*chan *msginterfaces.FunctionCallingResponse{&dch.functionCallingResponse}
}
// GetAgentStartedSpeakingResponse returns the agent started speaking response channels
func (dch MyHandler) GetAgentStartedSpeaking() []*chan *msginterfaces.AgentStartedSpeakingResponse {
return []*chan *msginterfaces.AgentStartedSpeakingResponse{&dch.agentStartedSpeakingResponse}
}
// GetAgentAudioDoneResponse returns the agent audio done response channels
func (dch MyHandler) GetAgentAudioDone() []*chan *msginterfaces.AgentAudioDoneResponse {
return []*chan *msginterfaces.AgentAudioDoneResponse{&dch.agentAudioDoneResponse}
}
func (dch MyHandler) GetEndOfThought() []*chan *msginterfaces.EndOfThoughtResponse {
return []*chan *msginterfaces.EndOfThoughtResponse{&dch.endOfThoughtResponse}
}
// GetClose returns the close channels
func (dch MyHandler) GetClose() []*chan *msginterfaces.CloseResponse {
return []*chan *msginterfaces.CloseResponse{&dch.closeChan}
}
// GetError returns the error channels
func (dch MyHandler) GetError() []*chan *msginterfaces.ErrorResponse {
return []*chan *msginterfaces.ErrorResponse{&dch.errorChan}
}
// GetUnhandled returns the unhandled event channels
func (dch MyHandler) GetUnhandled() []*chan *[]byte {
return []*chan *[]byte{&dch.unhandledChan}
}
// Open is the callback for when the connection opens
// golintci: funlen
func (dch MyHandler) Run() error {
wgReceivers := sync.WaitGroup{}
// binary channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
counter := 0
lastBytesReceived := time.Now().Add(-7 * time.Second)
for br := range dch.binaryChan {
fmt.Printf("\n\n[Binary Data]\n\n")
fmt.Printf("Size: %d\n\n", len(*br))
if lastBytesReceived.Add(5 * time.Second).Before(time.Now()) {
counter = counter + 1
file, err := os.OpenFile(fmt.Sprintf("output_%d.wav", counter), os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o666)
if err != nil {
fmt.Printf("Failed to open file. Err: %v\n", err)
continue
}
// Add a wav audio container header to the file if you want to play the audio
// using a media player like VLC, Media Player, or Apple Music
header := []byte{
0x52, 0x49, 0x46, 0x46, // "RIFF"
0x00, 0x00, 0x00, 0x00, // Placeholder for file size
0x57, 0x41, 0x56, 0x45, // "WAVE"
0x66, 0x6d, 0x74, 0x20, // "fmt "
0x10, 0x00, 0x00, 0x00, // Chunk size (16)
0x01, 0x00, // Audio format (1 for PCM)
0x01, 0x00, // Number of channels (1)
0x80, 0x3e, 0x00, 0x00, // Sample rate (16000)
0x00, 0x7d, 0x00, 0x00, // Byte rate (16000 * 2)
0x02, 0x00, // Block align (2)
0x10, 0x00, // Bits per sample (16)
0x64, 0x61, 0x74, 0x61, // "data"
0x00, 0x00, 0x00, 0x00, // Placeholder for data size
}
_, err = file.Write(header)
if err != nil {
fmt.Printf("Failed to write header to file. Err: %v\n", err)
continue
}
file.Close()
}
fmt.Printf("Dumping to WAV file\n")
file, err := os.OpenFile(fmt.Sprintf("output_%d.wav", counter), os.O_APPEND|os.O_WRONLY, 0o644)
if err != nil {
fmt.Printf("Failed to open file. Err: %v\n", err)
continue
}
_, err = file.Write(*br)
file.Close()
if err != nil {
fmt.Printf("Failed to write to file. Err: %v\n", err)
continue
}
lastBytesReceived = time.Now()
}
}()
// open channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.openChan {
fmt.Printf("\n\n[OpenResponse]\n\n")
}
}()
// welcome response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.welcomeResponse {
fmt.Printf("\n\n[WelcomeResponse]\n\n")
}
}()
// conversation text response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for ctr := range dch.conversationTextResponse {
fmt.Printf("\n\n[ConversationTextResponse]\n")
fmt.Printf("%s: %s\n\n", ctr.Role, ctr.Content)
}
}()
// user started speaking response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.userStartedSpeakingResponse {
fmt.Printf("\n\n[UserStartedSpeakingResponse]\n\n")
}
}()
// agent thinking response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.agentThinkingResponse {
fmt.Printf("\n\n[AgentThinkingResponse]\n\n")
}
}()
// function call request response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.functionCallRequestResponse {
fmt.Printf("\n\n[FunctionCallRequestResponse]\n\n")
}
}()
// function calling response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.functionCallingResponse {
fmt.Printf("\n\n[FunctionCallingResponse]\n\n")
}
}()
// agent started speaking response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.agentStartedSpeakingResponse {
fmt.Printf("\n\n[AgentStartedSpeakingResponse]\n\n")
}
}()
// agent audio done response channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.agentAudioDoneResponse {
fmt.Printf("\n\n[AgentAudioDoneResponse]\n\n")
}
}()
// close channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for _ = range dch.closeChan {
fmt.Printf("\n\n[CloseResponse]\n\n")
}
}()
// error channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for er := range dch.errorChan {
fmt.Printf("\n[ErrorResponse]\n")
fmt.Printf("\nError.Type: %s\n", er.ErrCode)
fmt.Printf("Error.Message: %s\n", er.ErrMsg)
fmt.Printf("Error.Description: %s\n\n", er.Description)
fmt.Printf("Error.Variant: %s\n\n", er.Variant)
}
}()
// unhandled event channel
wgReceivers.Add(1)
go func() {
defer wgReceivers.Done()
for byData := range dch.unhandledChan {
fmt.Printf("\n[UnhandledEvent]")
fmt.Printf("Dump:\n%s\n\n", string(*byData))
}
}()
// wait for all receivers to finish
wgReceivers.Wait()
return nil
}
func main() {
// init library
microphone.Initialize()
// print instructions
fmt.Print("\n\nPress ENTER to exit!\n\n")
/*
DG Streaming API
*/
// init library
client.Init(client.InitLib{
LogLevel: client.LogLevelDefault, // LogLevelDefault, LogLevelFull, LogLevelDebug, LogLevelTrace
})
// Go context
ctx := context.Background()
// client options
cOptions := &interfaces.ClientOptions{
EnableKeepAlive: true,
}
// set the Transcription options
tOptions := client.NewSettingsConfigurationOptions()
tOptions.Agent.Think.Provider.Type = "open_ai"
tOptions.Agent.Think.Model = "gpt-4o-mini"
tOptions.Agent.Think.Instructions = "You are a helpful AI assistant."
// implement your own callback
var callback msginterfaces.AgentMessageChan
callback = *NewMyHandler()
// create a Deepgram client
dgClient, err := client.NewWSUsingChan(ctx, "", cOptions, tOptions, callback)
if err != nil {
fmt.Println("ERROR creating LiveTranscription connection:", err)
return
}
// connect the websocket to Deepgram
fmt.Printf("Starting Agent...\n")
bConnected := dgClient.Connect()
if !bConnected {
fmt.Println("Client.Connect failed")
os.Exit(1)
}
/*
Microphone package
*/
// mic stuf
mic, err := microphone.New(microphone.AudioConfig{
InputChannels: 1,
SamplingRate: 16000,
})
if err != nil {
fmt.Printf("Initialize failed. Err: %v\n", err)
os.Exit(1)
}
// start the mic
fmt.Printf("Starting Microphone...\n")
err = mic.Start()
if err != nil {
fmt.Printf("mic.Start failed. Err: %v\n", err)
os.Exit(1)
}
go func() {
// feed the microphone stream to the Deepgram client (this is a blocking call)
mic.Stream(dgClient)
}()
// wait for user input to exit
input := bufio.NewScanner(os.Stdin)
input.Scan()
// close mic stream
fmt.Printf("Stopping Microphone...\n")
err = mic.Stop()
if err != nil {
fmt.Printf("mic.Stop failed. Err: %v\n", err)
os.Exit(1)
}
// teardown library
microphone.Teardown()
// close DG client
fmt.Printf("Stopping Agent...\n")
dgClient.Stop()
fmt.Printf("\n\nProgram exiting...\n")
}
Non-SDK Code Examples
If you would like to try out making a Deepgram Voice Agent request in a specific language (but not using Deepgram's SDKs), we offer a library of code-samples in this Github repo. However, we recommend first trying out our SDKs.
Rate Limits
EA projects start at 1 concurrent connection for testing. This can be raised upon request or with a contract.
Usage Tracking
Usage is calculated based on websocket connection time. 1 hour of websocket connection time = 1 hour of API usage.
Pricing
- Pay-as-you-go pricing for the standard tier of the voice agent api is $4.50 per hour
- If you bring your own LLM, standard tier pricing is $3.90 per hour.
What's Next?
- Check out the API Specification for more details on the Voice Agent API.
- Review the Agent API Feature Overview
- Review this audio format guide to determine the format of your audio.
- Learn how to measure streaming latency in real-time streaming of audio.
Updated about 15 hours ago