LiveKit and Deepgram
Build a real-time voice AI agent using LiveKit Agents with Deepgram speech-to-text and text-to-speech.
This guide walks you through building a voice AI agent that uses LiveKit Agents for real-time audio transport and Deepgram for speech-to-text (STT) and text-to-speech (TTS). By the end, you will have a working voice agent that listens to a user, generates a response with an LLM, and speaks back in real time.
Deepgram is available in LiveKit Agents through two paths:
This guide starts with LiveKit Inference for the fastest setup, then shows how to switch to the Deepgram Plugin for direct API access and advanced features.
Before you begin
Before you can use Deepgram, you need to create a Deepgram account. Signup is free and includes $200 in credit.
You need:
- A LiveKit Cloud account (or a self-hosted LiveKit server)
- An LLM API key. This guide uses OpenAI, but LiveKit Agents supports other providers
- Python 3.10+ or Node.js 18+
Step 1: Create the project
The fastest way to scaffold a new agent project is with the LiveKit CLI (lk). Alternatively, clone a starter template from GitHub (Python, Node.js).
When the CLI finishes, it prints a sandbox URL (for example, https://my-agent-xxxxx.sandbox.livekit.io). Note this URL. You will use it to test your agent later.
Step 2: Configure your environment
The CLI creates a .env.local file during setup. Open it and confirm your API keys are set:
If you used the CLIโs guided setup, these values are already populated. If not, add them from your LiveKit Cloud dashboard and OpenAI dashboard.
Step 3: Use Deepgram for TTS
The starter template uses Deepgram for STT but Cartesia for TTS. To use Deepgram for both, find the tts argument in the AgentSession constructor in src/agent.py (Python) or src/main.ts (Node.js) and replace it:
The agent now uses Deepgram for both STT and TTS. No additional dependencies or API keys are needed because both run through LiveKit Inference.
Browse available voices in the Deepgram voice library.
Step 4: Run the agent
Download the required model files (VAD, turn detection), then start the agent in development mode:
The dev command connects your agent to LiveKit Cloudโs sandbox environment. Open your sandbox URL in your browser to talk to your agent.
Step 5: Test the conversation
Once the agent is running, you should hear a greeting. Try these interactions to verify everything works:
- Ask a question and confirm the agent responds with speech.
- Start speaking while the agent is talking. It should stop and listen.
- Pause after speaking. The agent should detect the end of your turn and respond.
If any of these fail, check your API keys and confirm the agent process is running.
Use the Deepgram Plugin for advanced features
The steps above use LiveKit Inference, which hosts Deepgram models through LiveKit Cloud. If you need direct access to Deepgram features like speaker diarization, keyterm prompting, or fine-grained parameter control, use the Deepgram Plugin instead. This connects directly to Deepgramโs API with your own API key.
Install the plugin
Set your Deepgram API key
Add DEEPGRAM_API_KEY to your .env.local file:
Replace YOUR_DEEPGRAM_API_KEY with the API key from your Deepgram Console. The plugin reads this variable automatically at startup.
Update the AgentSession
Add the Deepgram import at the top of your entrypoint file, then replace the stt and tts arguments in the AgentSession:
Use Flux for turn detection
Flux is Deepgramโs conversational STT model with built-in turn detection. It uses acoustic and semantic cues to determine when a speaker has finished their turn, resulting in more natural conversations with fewer awkward pauses.
To use Flux, replace the stt configuration with STTv2 and set turn detection to "stt":
Even when using Flux for turn detection, include a VAD (Voice Activity Detection) plugin like Silero. Flux handles turn detection, but VAD is required for interruption handling โ without it, the agent cannot detect when a user speaks over the agentโs response.
Choose a different voice
Deepgram offers 60+ voices across seven languages. Replace the model parameter in TTS with any supported voice:
Browse all available voices in the Deepgram voice library.
Go further with Deepgram
- Use keyterm prompting to improve recognition of domain-specific vocabulary.
- Enable speaker diarization to assign a speaker label to each word in the transcript.
- See the full list of plugin parameters in the LiveKit Deepgram reference for Python and Node.js.