LiveKit and Deepgram
Build a real-time voice AI agent using LiveKit Agents with Deepgram speech-to-text and text-to-speech.
Build a real-time voice AI agent using LiveKit Agents with Deepgram speech-to-text and text-to-speech.
This guide walks you through building a voice AI agent that uses LiveKit Agents for real-time audio transport and Deepgram for speech-to-text (STT) and text-to-speech (TTS). By the end, you will have a working voice agent that listens to a user, generates a response with an LLM, and speaks back in real time.
Deepgram is available in LiveKit Agents through two paths:
This guide starts with LiveKit Inference for the fastest setup, then shows how to switch to the Deepgram Plugin for direct API access and advanced features.
Before you can use Deepgram, you need to create a Deepgram account. Signup is free and includes $200 in credit.
You need:
The fastest way to scaffold a new agent project is with the LiveKit CLI (lk). Alternatively, clone a starter template from GitHub (Python, Node.js).
When the CLI finishes, it prints a sandbox URL (for example, https://my-agent-xxxxx.sandbox.livekit.io). Note this URL. You will use it to test your agent later.
The CLI creates a .env.local file during setup. Open it and confirm your API keys are set:
If you used the CLI’s guided setup, these values are already populated. If not, add them from your LiveKit Cloud dashboard and OpenAI dashboard.
The starter template uses Deepgram for STT but Cartesia for TTS. To use Deepgram for both, find the tts argument in the AgentSession constructor in src/agent.py (Python) or src/main.ts (Node.js) and replace it:
The agent now uses Deepgram for both STT and TTS. No additional dependencies or API keys are needed because both run through LiveKit Inference.
Browse available voices in the Deepgram voice library.
Download the required model files (VAD, turn detection), then start the agent in development mode:
The dev command connects your agent to LiveKit Cloud’s sandbox environment. Open your sandbox URL in your browser to talk to your agent.
Once the agent is running, you should hear a greeting. Try these interactions to verify everything works:
If any of these fail, check your API keys and confirm the agent process is running.
The steps above use LiveKit Inference, which hosts Deepgram models through LiveKit Cloud. If you need direct access to Deepgram features like speaker diarization, keyterm prompting, or fine-grained parameter control, use the Deepgram Plugin instead. This connects directly to Deepgram’s API with your own API key.
Add DEEPGRAM_API_KEY to your .env.local file:
Replace YOUR_DEEPGRAM_API_KEY with the API key from your Deepgram Console. The plugin reads this variable automatically at startup.
Add the Deepgram import at the top of your entrypoint file, then replace the stt and tts arguments in the AgentSession:
Flux is Deepgram’s conversational STT model with built-in turn detection. It uses acoustic and semantic cues to determine when a speaker has finished their turn, resulting in more natural conversations with fewer awkward pauses.
To use Flux, replace the stt configuration with STTv2 and set turn detection to "stt":
Even when using Flux for turn detection, include a VAD (Voice Activity Detection) plugin like Silero. Flux handles turn detection, but VAD is required for interruption handling — without it, the agent cannot detect when a user speaks over the agent’s response.
Deepgram offers 60+ voices across seven languages. Replace the model parameter in TTS with any supported voice:
Browse all available voices in the Deepgram voice library.