Pipecat and Deepgram

Build a real-time voice AI agent using Pipecat with Deepgram speech-to-text and text-to-speech.

This guide walks you through building a voice AI agent that uses Pipecat for pipeline orchestration and Deepgram for speech-to-text (STT) and text-to-speech (TTS). By the end, you have a working voice agent that listens to a user, generates a response with an LLM, and speaks back in real-time.

Pipecat is an open-source Python framework for building voice and multimodal AI agents. It connects STT, LLM, and TTS services into a real-time pipeline and handles audio transport, turn-taking, and interruption detection.

Before you begin

Before you can use Deepgram, you need to create a Deepgram account. Signup is free and includes $200 in credit.

You need:

  • A Deepgram API key
  • An LLM API key. This guide uses OpenAI, but Pipecat supports other providers including Anthropic, Google, and Groq
  • Python 3.11+
  • Node.js 18+
  • uv (for installing the Pipecat CLI)

Step 1: Create the project

Install the Pipecat CLI, then scaffold a new project:

$uv tool install pipecat-ai-cli
$pipecat init --name my-bot --bot-type web --transport daily --mode cascade --stt deepgram_stt --llm openai_llm --tts deepgram_tts --no-deploy-to-cloud --client-framework vanilla

Step 2: Install dependencies

Navigate to the server directory inside your new project, create a virtual environment, and install the dependencies:

$cd my-bot/server
$uv venv --python 3.11
$uv pip install "pipecat-ai[daily,deepgram,openai,runner,silero]"

Step 3: Configure your environment

Copy the example environment file and fill in your API keys:

$cp .env.example .env

Replace the placeholder values with your API keys:

The remaining values are defaults you can change later.

Step 4: Run the agent

You need two terminals โ€” one for the server and one for the client.

In the first terminal, start the bot from the server directory:

$uv run python bot.py --transport daily

The first run takes about 20 seconds to download the Silero VAD model. Subsequent starts are faster.

In a second terminal, install the client dependencies and start the dev server:

$cd my-bot/client
$npm install
$npm run dev

Step 5: Test the conversation

Open the local URL shown in your terminal after npm run dev, allow microphone access, and start talking to your agent. Try these interactions:

  1. Ask a question and confirm the agent responds with speech.
  2. Start speaking while the agent is talking. It should stop and listen.
  3. Pause after speaking. The agent should detect the end of your turn and respond.

Using the quickstart repo instead

If you prefer not to use the CLI, clone the quickstart repo. The quickstart uses Deepgram for STT but Cartesia for TTS. To switch TTS to Deepgram, open bot.py and find the Cartesia TTS setup:

1# Remove this:
2from pipecat.services.cartesia.tts import CartesiaTTSService
3
4tts = CartesiaTTSService(
5 api_key=os.getenv("CARTESIA_API_KEY"),
6 settings=CartesiaTTSService.Settings(
7 voice="71a7ad14-091c-4e8e-a314-022ece01c121",
8 ),
9)

Replace it with:

1from pipecat.services.deepgram.tts import DeepgramTTSService
2
3tts = DeepgramTTSService(
4 api_key=os.getenv("DEEPGRAM_API_KEY"),
5 settings=DeepgramTTSService.Settings(
6 voice="aura-2-helena-en",
7 ),
8)

You can also remove CARTESIA_API_KEY from your .env file since it is no longer needed. No other changes are required. The STT service already uses Deepgram and the rest of the pipeline stays the same.

Use Flux for turn detection

Flux is Deepgramโ€™s conversational STT model with built-in turn detection. It uses acoustic and semantic cues to determine when a speaker has finished their turn, resulting in more natural conversations.

To use Flux, replace DeepgramSTTService with DeepgramFluxSTTService in your bot.py:

1from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService
2
3stt = DeepgramFluxSTTService(
4 api_key=os.getenv("DEEPGRAM_API_KEY"),
5 settings=DeepgramFluxSTTService.Settings(
6 eager_eot_threshold=0.5,
7 eot_threshold=0.8,
8 ),
9)

Even when using Flux for turn detection, keep the Silero VAD analyzer in your pipeline. Flux handles turn detection, but VAD is required for interruption handling. Without it, the agent cannot detect when a user speaks over the agentโ€™s response.

Go further with Deepgram

  • Voices โ€” Deepgram offers 60+ voices across seven languages. Browse the voice library and update DEEPGRAM_VOICE_ID in your .env file.
  • Keyterm prompting โ€” Improve recognition of domain-specific vocabulary by passing keyterms to Nova-3 via the STT service settings.
  • Speaker diarization โ€” Assign a speaker identifier to each word in the transcript using diarization via the STT service settings.
  • Dynamic STT settings โ€” Pipecat supports updating Deepgram STT settings without reconnecting. See the Pipecat Deepgram STT guide for details.

Resources