Build a Voice Agent with Go

Create a real-time voice agent using the Deepgram Go SDK.

This tutorial walks you through building a basic voice agent using Go and the Deepgram SDK. You will learn how to connect to the Agent API, configure its behavior, and stream audio for processing.

Prerequisites

Before you begin, ensure you have the following:

  • A Deepgram API key. You can get one in the Deepgram Console.
  • Go installed on your machine.

1. Set up your environment

Create a new directory for your project and initialize a Go module.

$mkdir deepgram-agent-demo
$cd deepgram-agent-demo
$go mod init deepgram-agent-demo
$touch main.go

Export your Deepgram API key as an environment variable.

$export DEEPGRAM_API_KEY="your_api_key"

2. Install the Deepgram SDK

Install the Deepgram Go SDK. The /v3 suffix is required because Go uses major-version module paths.

$go get github.com/deepgram/deepgram-go-sdk/v3

3. Create the Voice Agent

Open main.go and add the following code. This script connects to Deepgram, configures the agent, and streams a sample audio file.

1package main
2
3import (
4 "bufio"
5 "context"
6 "fmt"
7 "net/http"
8 "os"
9
10 msginterfaces "github.com/deepgram/deepgram-go-sdk/v3/pkg/api/agent/v1/websocket/interfaces"
11 client "github.com/deepgram/deepgram-go-sdk/v3/pkg/client/agent"
12 "github.com/deepgram/deepgram-go-sdk/v3/pkg/client/interfaces"
13)
14
15type MyHandler struct {
16 binaryChan chan *[]byte
17}
18
19func (dch MyHandler) GetBinary() []*chan *[]byte { return []*chan *[]byte{&dch.binaryChan} }
20func (dch MyHandler) GetOpen() []*chan *msginterfaces.OpenResponse { return nil }
21func (dch MyHandler) GetWelcome() []*chan *msginterfaces.WelcomeResponse { return nil }
22func (dch MyHandler) GetConversationText() []*chan *msginterfaces.ConversationTextResponse { return nil }
23func (dch MyHandler) GetUserStartedSpeaking() []*chan *msginterfaces.UserStartedSpeakingResponse { return nil }
24func (dch MyHandler) GetAgentThinking() []*chan *msginterfaces.AgentThinkingResponse { return nil }
25func (dch MyHandler) GetAgentStartedSpeaking() []*chan *msginterfaces.AgentStartedSpeakingResponse { return nil }
26func (dch MyHandler) GetAgentAudioDone() []*chan *msginterfaces.AgentAudioDoneResponse { return nil }
27func (dch MyHandler) GetClose() []*chan *msginterfaces.CloseResponse { return nil }
28func (dch MyHandler) GetError() []*chan *msginterfaces.ErrorResponse { return nil }
29func (dch MyHandler) GetUnhandled() []*chan *[]byte { return nil }
30func (dch MyHandler) GetInjectionRefused() []*chan *msginterfaces.InjectionRefusedResponse { return nil }
31func (dch MyHandler) GetKeepAlive() []*chan *msginterfaces.KeepAlive { return nil }
32func (dch MyHandler) GetFunctionCallRequest() []*chan *msginterfaces.FunctionCallRequestResponse { return nil }
33func (dch MyHandler) GetSettingsApplied() []*chan *msginterfaces.SettingsAppliedResponse { return nil }
34
35func main() {
36 ctx := context.Background()
37
38 client.Init(client.InitLib{LogLevel: client.LogLevelDefault})
39
40 cOptions := &interfaces.ClientOptions{EnableKeepAlive: true}
41 tOptions := client.NewSettingsConfigurationOptions()
42 tOptions.Audio.Output.Encoding = "linear16"
43 tOptions.Audio.Output.SampleRate = 24000
44 tOptions.Audio.Output.Container = "wav"
45 tOptions.Agent.Language = "en"
46 tOptions.Agent.Greeting = "Hello! How can I help you today?"
47 tOptions.Agent.Listen.Provider = map[string]interface{}{
48 "type": "deepgram",
49 "model": "nova-3",
50 }
51 tOptions.Agent.Think.Provider = map[string]interface{}{
52 "type": "open_ai",
53 "model": "gpt-4o-mini",
54 }
55 tOptions.Agent.Think.Prompt = "You are a friendly AI assistant."
56 tOptions.Agent.Speak.Provider = map[string]interface{}{
57 "type": "deepgram",
58 "model": "aura-2-thalia-en",
59 }
60
61 handler := &MyHandler{binaryChan: make(chan *[]byte)}
62
63 go func() {
64 counter := 0
65 for br := range handler.binaryChan {
66 counter++
67 file, _ := os.Create(fmt.Sprintf("output_%d.wav", counter))
68 file.Write(*br)
69 file.Close()
70 }
71 }()
72
73 dgClient, _ := client.NewWSUsingChan(ctx, "", cOptions, tOptions, msginterfaces.AgentMessageChan(*handler))
74 dgClient.Connect()
75
76 resp, _ := http.Get("https://dpgr.am/spacewalk.wav")
77 defer resp.Body.Close()
78
79 dgClient.Stream(bufio.NewReader(resp.Body))
80
81 fmt.Println("Press ENTER to exit")
82 bufio.NewScanner(os.Stdin).Scan()
83 dgClient.Stop()
84}

4. Run the Voice Agent

Run your script using the Go CLI.

$go run main.go

The agent will process the audio and generate responses. You can find the agent’s audio responses in output_*.wav files in your project directory.

Next steps

Now that you have built a basic agent, you can customize its behavior: