Build a Voice Agent with LiveKit and Deepgram
Learn how to build a real-time voice agent using Deepgram for speech-to-text and text-to-speech, with LiveKit for WebRTC audio transport.
Learn how to build a real-time voice agent using Deepgram for speech-to-text and text-to-speech, with LiveKit for WebRTC audio transport.
If you already use LiveKit for WebRTC audio transport, you can add Deepgram’s speech-to-text and text-to-speech models to your LiveKit agent pipeline. LiveKit’s framework requires separate STT, LLM, and TTS providers, so this guide pairs Deepgram’s audio models with OpenAI for language understanding — though any LiveKit-compatible LLM works.
For a standalone voice agent without LiveKit or an external LLM, see the Deepgram Voice Agent API, which bundles STT, LLM routing, and TTS in a single WebSocket connection.
This guide assumes you are familiar with Python or Node.js and have a basic understanding of how voice agents work.
You’ll need a Deepgram account and an API key. Signup is free and includes $200 in credit.
This tutorial uses OpenAI for its LLM. You’ll need to sign up for an OpenAI account and obtain an API key.
You’ll need a LiveKit Cloud account with your LiveKit URL, API Key, and API Secret.
This implementation is a starting reference for building your own voice agent with LiveKit and Deepgram. It is not designed for production deployments.
Create a new directory, set up your environment, and install the LiveKit agents framework along with the Deepgram and Silero plugins:
Create a .env file in your project root with the credentials you collected earlier. The agent reads these at startup to authenticate with each service:
LIVEKIT_URL is the WebSocket endpoint for your LiveKit Cloud project. You can find it along with your API key and secret in the LiveKit dashboard.
The agent connects to a LiveKit room, creates a session with Deepgram for audio processing and OpenAI for language understanding, and starts listening for speech.
The key components are:
Agent — defines the agent’s personality and instructions.AgentSession — wires together the STT, LLM, TTS, and VAD providers into a pipeline. When a user speaks, audio flows through Deepgram Nova-3 for transcription, OpenAI GPT-4o for a response, and Deepgram Aura for speech synthesis.generate_reply — triggers the agent’s first message so it greets the user without waiting for input.Create agent.py (Python) or agent.ts (Node.js):
Start the agent in development mode. The dev flag connects the agent to your LiveKit Cloud project and automatically registers it to handle incoming sessions:
LiveKit provides the Agents Playground — a browser-based tool for testing agents. It includes video, chat, and other features, but for this tutorial you only need the microphone.
You can generate a participant token from the LiveKit dashboard or using the LiveKit CLI.
The agent greets you automatically on connect. Silero VAD detects when you stop speaking and triggers the STT-to-LLM-to-TTS pipeline. You can interrupt the agent mid-sentence — VAD handles barge-in automatically.