Build a Voice Agent with Pipecat and Deepgram

Learn how to build a real-time voice agent using Deepgram for speech-to-text and text-to-speech, with Pipecat for pipeline orchestration.

If you already use Pipecat for voice AI pipelines, you can add Deepgram’s speech-to-text and text-to-speech models to your Pipecat agent. Pipecat’s pipeline architecture connects separate STT, LLM, and TTS services, so this guide pairs Deepgram’s audio models with OpenAI for language understanding — though any Pipecat-compatible LLM works.

For a standalone voice agent without Pipecat or an external LLM, see the Deepgram Voice Agent API, which bundles STT, LLM routing, and TTS in a single WebSocket connection.

For a CLI-scaffolded approach using pipecat init, see the Pipecat and Deepgram integration guide.

Before You Begin

This guide assumes you are familiar with Python and have a basic understanding of how voice agents work.

You’ll need a Deepgram account and an API key. Signup is free and includes $200 in credit.

Get OpenAI Credentials

This tutorial uses OpenAI for its LLM. You’ll need to sign up for an OpenAI account and obtain an API key.

Requirements

Python 3.11+

Set Up Your Project

This implementation is a starting reference for building your own voice agent with Pipecat and Deepgram. It is not designed for production deployments.

Create a new directory, set up a virtual environment, and install Pipecat with the Deepgram, OpenAI, and runner extras:

$ mkdir deepgram-pipecat-agent
$ cd deepgram-pipecat-agent
$ python -m venv venv
$ source venv/bin/activate  # On Windows: venv\Scripts\activate
$ pip install "pipecat-ai[deepgram,openai,silero,webrtc]" python-dotenv

The runner extra includes a built-in WebRTC transport with a browser-based test client — no external account needed.

Set Environment Variables

Create a .env file in your project root with the credentials you collected earlier. The agent reads these at startup to authenticate with each service:

DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key

Build the Agent

The agent creates a pipeline that connects audio input to Deepgram for transcription, OpenAI for a response, and Deepgram again for speech synthesis. Pipecat handles the audio transport, turn-taking, and interruption detection.

The key components are:

Pipeline — connects frame processors in sequence: transport input, STT, context aggregation, LLM, TTS, and transport output.
LLMContextAggregatorPair — manages conversation context and uses Silero VAD to detect when the user starts and stops speaking.
LLMRunFrame — triggers the LLM to generate a response. Used here to make the agent greet the user on connect.

Create bot.py:

1 # bot.py
2 
3 import os
4 
5 from dotenv import load_dotenv
6 
7 from pipecat.audio.vad.silero import SileroVADAnalyzer
8 from pipecat.frames.frames import LLMRunFrame
9 from pipecat.pipeline.pipeline import Pipeline
10 from pipecat.pipeline.runner import PipelineRunner
11 from pipecat.pipeline.task import PipelineParams, PipelineTask
12 from pipecat.processors.aggregators.llm_context import LLMContext
13 from pipecat.processors.aggregators.llm_response_universal import (
14     LLMContextAggregatorPair,
15     LLMUserAggregatorParams,
16 )
17 from pipecat.runner.types import RunnerArguments
18 from pipecat.runner.utils import create_transport
19 from pipecat.services.deepgram.stt import DeepgramSTTService
20 from pipecat.services.deepgram.tts import DeepgramTTSService
21 from pipecat.services.openai.llm import OpenAILLMService
22 from pipecat.transports.base_transport import BaseTransport, TransportParams
23 
24 load_dotenv(override=True)
25 
26 transport_params = {
27     "webrtc": lambda: TransportParams(
28         audio_in_enabled=True,
29         audio_out_enabled=True,
30     ),
31 }
32 
33 
34 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
35     stt = DeepgramSTTService(
36         api_key=os.getenv("DEEPGRAM_API_KEY"),
37         settings=DeepgramSTTService.Settings(
38             model="nova-3-general",
39             language="en",
40             punctuate=True,
41             smart_format=True,
42         ),
43     )
44 
45     llm = OpenAILLMService(
46         api_key=os.getenv("OPENAI_API_KEY"),
47         settings=OpenAILLMService.Settings(
48             model="gpt-4o",
49             system_instruction=(
50                 "You are a friendly, helpful voice assistant. "
51                 "Keep your responses concise — aim for 1-3 sentences "
52                 "unless the user asks for detail. "
53                 "Your responses will be spoken aloud, so avoid emojis, "
54                 "bullet points, or other formatting that can't be spoken."
55             ),
56         ),
57     )
58 
59     tts = DeepgramTTSService(
60         api_key=os.getenv("DEEPGRAM_API_KEY"),
61         settings=DeepgramTTSService.Settings(
62             voice="aura-2-thalia-en",
63         ),
64     )
65 
66     context = LLMContext()
67 
68     user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
69         context,
70         user_params=LLMUserAggregatorParams(
71             vad_analyzer=SileroVADAnalyzer(),
72         ),
73     )
74 
75     pipeline = Pipeline([
76         transport.input(),
77         stt,
78         user_aggregator,
79         llm,
80         tts,
81         transport.output(),
82         assistant_aggregator,
83     ])
84 
85     task = PipelineTask(
86         pipeline,
87         params=PipelineParams(
88             enable_metrics=True,
89             enable_usage_metrics=True,
90         ),
91     )
92 
93     @transport.event_handler("on_client_connected")
94     async def on_client_connected(transport, client):
95         context.add_message(
96             {"role": "developer", "content": "Greet the user and ask how you can help."}
97         )
98         await task.queue_frames([LLMRunFrame()])
99 
100     @transport.event_handler("on_client_disconnected")
101     async def on_client_disconnected(transport, client):
102         await task.cancel()
103 
104     runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
105     await runner.run(task)
106 
107 
108 async def bot(runner_args: RunnerArguments):
109     transport = await create_transport(runner_args, transport_params)
110     await run_bot(transport, runner_args)
111 
112 
113 if __name__ == "__main__":
114     from pipecat.runner.run import main
115 
116     main()

How the pipeline works

Audio flows through the pipeline from left to right:

transport.input() captures microphone audio from the browser.
stt (Deepgram Nova-3) transcribes audio into text in real-time.
user_aggregator collects transcription frames and uses Silero VAD to detect when the user finishes speaking, then adds the complete utterance to the conversation context.
llm (OpenAI GPT-4o) generates a text response based on the conversation context.
tts (Deepgram Aura) converts the response text to speech.
transport.output() sends the audio back to the browser.
assistant_aggregator records the assistant’s response in the conversation context for future turns.

When a user speaks over the agent, Silero VAD detects the interruption and cancels the current TTS output so the agent stops and listens.

Run the Agent

Start the agent with the WebRTC transport. The -t webrtc flag launches a built-in browser client for testing:

$ python bot.py -t webrtc

The first run takes about 20 seconds to download the Silero VAD model. Subsequent starts are faster.

Test the Agent

Once the agent is running, open http://localhost:7860/client in your browser.

Allow microphone access when prompted
Click to connect — the agent greets you automatically
Start talking — the agent responds in real time

Try interrupting the agent mid-sentence. Silero VAD detects your speech and cancels the current response so the agent listens to you instead.

Build a Voice Agent with Pipecat and Deepgram

Build a Voice Agent with Pipecat and Deepgram

Before You Begin

Get OpenAI Credentials

Requirements

Set Up Your Project

Set Environment Variables

Build the Agent

How the pipeline works

Run the Agent

Test the Agent

Further Reading

Before You Begin

Get OpenAI Credentials

Requirements

Set Up Your Project

Set Environment Variables

Build the Agent

How the pipeline works

Run the Agent

Test the Agent

Further Reading

$	mkdir deepgram-pipecat-agent
$	cd deepgram-pipecat-agent
$	python -m venv venv
$	source venv/bin/activate # On Windows: venv\Scripts\activate
$	pip install "pipecat-ai[deepgram,openai,silero,webrtc]" python-dotenv

1	# bot.py
2
3	import os
4
5	from dotenv import load_dotenv
6
7	from pipecat.audio.vad.silero import SileroVADAnalyzer
8	from pipecat.frames.frames import LLMRunFrame
9	from pipecat.pipeline.pipeline import Pipeline
10	from pipecat.pipeline.runner import PipelineRunner
11	from pipecat.pipeline.task import PipelineParams, PipelineTask
12	from pipecat.processors.aggregators.llm_context import LLMContext
13	from pipecat.processors.aggregators.llm_response_universal import (
14	LLMContextAggregatorPair,
15	LLMUserAggregatorParams,
16	)
17	from pipecat.runner.types import RunnerArguments
18	from pipecat.runner.utils import create_transport
19	from pipecat.services.deepgram.stt import DeepgramSTTService
20	from pipecat.services.deepgram.tts import DeepgramTTSService
21	from pipecat.services.openai.llm import OpenAILLMService
22	from pipecat.transports.base_transport import BaseTransport, TransportParams
23
24	load_dotenv(override=True)
25
26	transport_params = {
27	"webrtc": lambda: TransportParams(
28	audio_in_enabled=True,
29	audio_out_enabled=True,
30	),
31	}
32
33
34	async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
35	stt = DeepgramSTTService(
36	api_key=os.getenv("DEEPGRAM_API_KEY"),
37	settings=DeepgramSTTService.Settings(
38	model="nova-3-general",
39	language="en",
40	punctuate=True,
41	smart_format=True,
42	),
43	)
44
45	llm = OpenAILLMService(
46	api_key=os.getenv("OPENAI_API_KEY"),
47	settings=OpenAILLMService.Settings(
48	model="gpt-4o",
49	system_instruction=(
50	"You are a friendly, helpful voice assistant. "
51	"Keep your responses concise — aim for 1-3 sentences "
52	"unless the user asks for detail. "
53	"Your responses will be spoken aloud, so avoid emojis, "
54	"bullet points, or other formatting that can't be spoken."
55	),
56	),
57	)
58
59	tts = DeepgramTTSService(
60	api_key=os.getenv("DEEPGRAM_API_KEY"),
61	settings=DeepgramTTSService.Settings(
62	voice="aura-2-thalia-en",
63	),
64	)
65
66	context = LLMContext()
67
68	user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
69	context,
70	user_params=LLMUserAggregatorParams(
71	vad_analyzer=SileroVADAnalyzer(),
72	),
73	)
74
75	pipeline = Pipeline([
76	transport.input(),
77	stt,
78	user_aggregator,
79	llm,
80	tts,
81	transport.output(),
82	assistant_aggregator,
83	])
84
85	task = PipelineTask(
86	pipeline,
87	params=PipelineParams(
88	enable_metrics=True,
89	enable_usage_metrics=True,
90	),
91	)
92
93	@transport.event_handler("on_client_connected")
94	async def on_client_connected(transport, client):
95	context.add_message(
96	{"role": "developer", "content": "Greet the user and ask how you can help."}
97	)
98	await task.queue_frames([LLMRunFrame()])
99
100	@transport.event_handler("on_client_disconnected")
101	async def on_client_disconnected(transport, client):
102	await task.cancel()
103
104	runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
105	await runner.run(task)
106
107
108	async def bot(runner_args: RunnerArguments):
109	transport = await create_transport(runner_args, transport_params)
110	await run_bot(transport, runner_args)
111
112
113	if __name__ == "__main__":
114	from pipecat.runner.run import main
115
116	main()