LiveKit and Deepgram | Deepgram's Docs

This guide walks you through building a voice AI agent that uses LiveKit Agents for real-time audio transport and Deepgram for speech-to-text (STT) and text-to-speech (TTS). By the end, you will have a working voice agent that listens to a user, generates a response with an LLM, and speaks back in real time.

Deepgram is available in LiveKit Agents through two paths:

Path	Description
LiveKit Inference	Deepgram models hosted and billed through LiveKit Cloud. Supports Nova-3, Nova-2 variants (`nova-2`, `nova-2-medical`, `nova-2-phonecall`), Flux, and Aura-2. No Deepgram API key required.
Deepgram Plugin	Connects directly to Deepgram’s API with your own API key. Includes diarization, keyterm prompting, and advanced parameters.

This guide starts with LiveKit Inference for the fastest setup, then shows how to switch to the Deepgram Plugin for direct API access and advanced features.

Before you begin

Before you can use Deepgram, you need to create a Deepgram account. Signup is free and includes $200 in credit.

You need:

A LiveKit Cloud account (or a self-hosted LiveKit server)
An LLM. The starter uses LiveKit Inference (inference.LLM(model="openai/chat-latest")), which runs through LiveKit Cloud and requires no separate provider key. A dedicated LLM API key is only needed if you bring your own provider, such as OpenAI; LiveKit Agents supports other providers too.
Python 3.10+ or Node.js 20+

Step 1: Create the project

The fastest way to scaffold a new agent project is with the LiveKit CLI (lk). Alternatively, clone a starter template from GitHub (Python, Node.js).

$ lk agent init my-agent --template agent-starter-python
$ cd my-agent
$ uv sync

When the CLI finishes, your agent is registered with LiveKit Cloud. You’ll test it later from the Agent Console.

Step 2: Configure your environment

The CLI creates a .env.local file during setup. Open it and confirm your API keys are set:

$ LIVEKIT_URL=YOUR_LIVEKIT_CLOUD_URL
$ LIVEKIT_API_KEY=YOUR_LIVEKIT_API_KEY
$ LIVEKIT_API_SECRET=YOUR_LIVEKIT_API_SECRET
$ # Only required if you switch the LLM to OpenAI directly instead of using LiveKit Inference:
$ OPENAI_API_KEY=YOUR_OPENAI_API_KEY

If you used the CLI’s guided setup, these values are already populated. If not, add them from your LiveKit Cloud dashboard. The OPENAI_API_KEY is optional — the starter uses LiveKit Inference by default, so you only need it if you bring your own OpenAI key from the OpenAI dashboard.

Step 3: Use Deepgram for TTS

At the time of writing, the starter template uses Deepgram for STT but Cartesia for TTS. These defaults change frequently and aren’t guaranteed, so check the generated code. To use Deepgram for both, find the tts argument in the AgentSession constructor in src/agent.py (Python) or src/main.ts (Node.js) and replace it:

1 # Replace the existing tts line with:
2 tts=inference.TTS(model="deepgram/aura-2", voice="thalia"),

The agent now uses Deepgram for both STT and TTS. No additional dependencies or API keys are needed because both run through LiveKit Inference.

Browse available voices in the Deepgram voice library.

Step 4: Run the agent

Start the agent in development mode. The required model files (VAD, turn detection) are now downloaded automatically:

$ uv run src/agent.py dev

The dev command connects your agent to LiveKit Cloud. Open the Agent Console in your browser to talk to your agent.

Step 5: Test the conversation

Once the agent is running, you should hear a greeting. Try these interactions to verify everything works:

Ask a question and confirm the agent responds with speech.
Start speaking while the agent is talking. It should stop and listen.
Pause after speaking. The agent should detect the end of your turn and respond.

If any of these fail, check your API keys and confirm the agent process is running.

Use the Deepgram Plugin for advanced features

The steps above use LiveKit Inference, which hosts Deepgram models through LiveKit Cloud. If you need direct access to Deepgram features like speaker diarization, keyterm prompting, or fine-grained parameter control, use the Deepgram Plugin instead. This connects directly to Deepgram’s API with your own API key.

Install the plugin

$ uv add "livekit-agents[deepgram]~=1.4"

Set your Deepgram API key

Add DEEPGRAM_API_KEY to your .env.local file:

$ DEEPGRAM_API_KEY=YOUR_DEEPGRAM_API_KEY

Replace YOUR_DEEPGRAM_API_KEY with the API key from your Deepgram Console. The plugin reads this variable automatically at startup.

Update the AgentSession

Add the Deepgram import at the top of your entrypoint file, then replace the stt and tts arguments in the AgentSession:

1 from livekit.plugins import deepgram
2 
3 # Replace the stt and tts lines in your AgentSession:
4 stt=deepgram.STT(
5     model="nova-3",
6     language="en",
7     punctuate=True,
8     interim_results=True,
9 ),
10 tts=deepgram.TTS(model="aura-2-thalia-en"),

Use Flux for turn detection

Flux is Deepgram’s conversational STT model with built-in turn detection. It uses acoustic and semantic cues to determine when a speaker has finished their turn, resulting in more natural conversations with fewer awkward pauses.

To use Flux, replace the stt configuration with STTv2 and set turn detection to "stt":

1 # Replace the stt line and add turn_handling:
2 stt=deepgram.STTv2(model="flux-general-en"),
3 turn_handling=TurnHandlingOptions(
4   turn_detection="stt",
5 ),

Even when using Flux for turn detection, a VAD (Voice Activity Detection) is required for interruption handling — without it, the agent cannot detect when a user speaks over the agent’s response. If you don’t specify a VAD, LiveKit auto-provisions one for you. Note that this is not the Silero plugin itself, but a bundled inference VAD that is still based on Silero. You can also add the Silero plugin explicitly if you prefer to manage it yourself.

Choose a different voice

Deepgram offers 60+ voices across seven languages. Replace the model parameter in TTS with any supported voice:

1 tts = deepgram.TTS(model="aura-2-andromeda-en")

Browse all available voices in the Deepgram voice library.

Go further with Deepgram

Use keyterm prompting to improve recognition of domain-specific vocabulary.
Enable speaker diarization to assign a speaker label to each word in the transcript.
See the full list of plugin parameters in the LiveKit Deepgram reference for Python and Node.js.