Build a Voice Agent with LiveKit and Deepgram

If you already use LiveKit for WebRTC audio transport, you can add Deepgram’s speech-to-text and text-to-speech models to your LiveKit agent pipeline. LiveKit’s framework requires separate STT, LLM, and TTS providers, so this guide pairs Deepgram’s audio models with OpenAI for language understanding — though any LiveKit-compatible LLM works.

For a standalone voice agent without LiveKit or an external LLM, see the Deepgram Voice Agent API, which bundles STT, LLM routing, and TTS in a single WebSocket connection.

Before You Begin

This guide assumes you are familiar with Python or Node.js and have a basic understanding of how voice agents work.

You’ll need a Deepgram account and an API key. Signup is free and includes $200 in credit.

Get OpenAI Credentials

This tutorial uses OpenAI for its LLM. You’ll need to sign up for an OpenAI account and obtain an API key.

Get LiveKit Credentials

You’ll need a LiveKit Cloud account with your LiveKit URL, API Key, and API Secret.

Requirements

Python 3.10+ or Node.js 18+

Set Up Your Project

This implementation is a starting reference for building your own voice agent with LiveKit and Deepgram. It is not designed for production deployments.

Create a new directory, set up your environment, and install the LiveKit agents framework along with the Deepgram and Silero plugins:

$ mkdir deepgram-livekit-agent
$ cd deepgram-livekit-agent
$ python -m venv venv
$ source venv/bin/activate  # On Windows: venv\Scripts\activate
$ pip install "livekit-agents[openai]" livekit-plugins-deepgram livekit-plugins-silero python-dotenv

Set Environment Variables

Create a .env file in your project root with the credentials you collected earlier. The agent reads these at startup to authenticate with each service:

DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

LIVEKIT_URL is the WebSocket endpoint for your LiveKit Cloud project. You can find it along with your API key and secret in the LiveKit dashboard.

Build the Agent

The agent connects to a LiveKit room, creates a session with Deepgram for audio processing and OpenAI for language understanding, and starts listening for speech.

The key components are:

Agent — defines the agent’s personality and instructions.
AgentSession — wires together the STT, LLM, TTS, and VAD providers into a pipeline. When a user speaks, audio flows through Deepgram Nova-3 for transcription, OpenAI GPT-4o for a response, and Deepgram Aura for speech synthesis.
generate_reply — triggers the agent’s first message so it greets the user without waiting for input.

Create agent.py (Python) or agent.ts (Node.js):

1 # agent.py
2 
3 from dotenv import load_dotenv
4 from livekit.agents import (
5     Agent,
6     AgentSession,
7     AgentServer,
8     JobContext,
9     cli,
10 )
11 from livekit.plugins import deepgram, openai, silero
12 
13 load_dotenv()
14 
15 server = AgentServer()
16 
17 
18 class VoiceAssistant(Agent):
19     def __init__(self):
20         super().__init__(
21             instructions=(
22                 "You are a friendly, helpful voice assistant. "
23                 "Keep your responses concise — aim for 1-3 sentences "
24                 "unless the user asks for detail."
25             ),
26         )
27 
28 
29 @server.rtc_session()
30 async def entrypoint(ctx: JobContext):
31     await ctx.connect()
32 
33     session = AgentSession(
34         stt=deepgram.STT(
35             model="nova-3",
36             language="en",
37             punctuate=True,
38             smart_format=True,
39             interim_results=True,
40         ),
41         llm=openai.LLM(model="gpt-4o"),
42         tts=deepgram.TTS(model="aura-2-thalia-en"),
43         vad=silero.VAD.load(),
44     )
45 
46     await session.start(
47         agent=VoiceAssistant(),
48         room=ctx.room,
49     )
50 
51     await session.generate_reply(
52         instructions="Greet the user and ask how you can help.",
53         allow_interruptions=True,
54     )
55 
56 
57 if __name__ == "__main__":
58     cli.run_app(server)

Run the Agent

Start the agent in development mode. The dev flag connects the agent to your LiveKit Cloud project and automatically registers it to handle incoming sessions:

$ python agent.py dev

Test the Agent

LiveKit provides the Agents Playground — a browser-based tool for testing agents. It includes video, chat, and other features, but for this tutorial you only need the microphone.

Go to agents-playground.livekit.io
Enter your LiveKit Cloud URL and a participant token, then connect
Allow microphone access when prompted
Start talking — the agent should respond in real time

You can generate a participant token from the LiveKit dashboard or using the LiveKit CLI.

The agent greets you automatically on connect. Silero VAD detects when you stop speaking and triggers the STT-to-LLM-to-TTS pipeline. You can interrupt the agent mid-sentence — VAD handles barge-in automatically.