Build a Voice Agent with LiveKit and Deepgram
Learn how to build a real-time voice agent using Deepgram for speech-to-text and text-to-speech, with LiveKit for WebRTC audio transport.
If you already use LiveKit for WebRTC audio transport, you can add Deepgramβs speech-to-text and text-to-speech models to your LiveKit agent pipeline. LiveKitβs framework requires separate STT, LLM, and TTS providers, so this guide pairs Deepgramβs audio models with OpenAI for language understanding β though any LiveKit-compatible LLM works.
For a standalone voice agent without LiveKit or an external LLM, see the Deepgram Voice Agent API, which bundles STT, LLM routing, and TTS in a single WebSocket connection.
Before You Begin
This guide assumes you are familiar with Python or Node.js and have a basic understanding of how voice agents work.
Youβll need a Deepgram account and an API key. Signup is free and includes $200 in credit.
Get OpenAI Credentials
This tutorial uses OpenAI for its LLM. Youβll need to sign up for an OpenAI account and obtain an API key.
Get LiveKit Credentials
Youβll need a LiveKit Cloud account with your LiveKit URL, API Key, and API Secret.
Requirements
- Python 3.10+ or Node.js 18+
Set Up Your Project
This implementation is a starting reference for building your own voice agent with LiveKit and Deepgram. It is not designed for production deployments.
Create a new directory, set up your environment, and install the LiveKit agents framework along with the Deepgram and Silero plugins:
Set Environment Variables
Create a .env file in your project root with the credentials you collected earlier. The agent reads these at startup to authenticate with each service:
LIVEKIT_URL is the WebSocket endpoint for your LiveKit Cloud project. You can find it along with your API key and secret in the LiveKit dashboard.
Build the Agent
The agent connects to a LiveKit room, creates a session with Deepgram for audio processing and OpenAI for language understanding, and starts listening for speech.
The key components are:
Agentβ defines the agentβs personality and instructions.AgentSessionβ wires together the STT, LLM, TTS, and VAD providers into a pipeline. When a user speaks, audio flows through Deepgram Nova-3 for transcription, OpenAI GPT-4o for a response, and Deepgram Aura for speech synthesis.generate_replyβ triggers the agentβs first message so it greets the user without waiting for input.
Create agent.py (Python) or agent.ts (Node.js):
Run the Agent
Start the agent in development mode. The dev flag connects the agent to your LiveKit Cloud project and automatically registers it to handle incoming sessions:
Test the Agent
LiveKit provides the Agents Playground β a browser-based tool for testing agents. It includes video, chat, and other features, but for this tutorial you only need the microphone.
- Go to agents-playground.livekit.io
- Enter your LiveKit Cloud URL and a participant token, then connect
- Allow microphone access when prompted
- Start talking β the agent should respond in real time
You can generate a participant token from the LiveKit dashboard or using the LiveKit CLI.
The agent greets you automatically on connect. Silero VAD detects when you stop speaking and triggers the STT-to-LLM-to-TTS pipeline. You can interrupt the agent mid-sentence β VAD handles barge-in automatically.
Further Reading
- Deepgram Voice Agent API β build voice agents without an external LLM or transport layer
- LiveKit Agents Documentation β LiveKitβs agent framework reference