Pipecat and Deepgram

Build a real-time voice AI agent using Pipecat with Deepgram speech-to-text and text-to-speech.

This guide walks you through building a voice AI agent that uses Pipecat for pipeline orchestration and Deepgram for speech-to-text (STT) and text-to-speech (TTS). By the end, you have a working voice agent that listens to a user, generates a response with an LLM, and speaks back in real-time.

Pipecat is an open-source Python framework for building voice and multimodal AI agents. It connects STT, LLM, and TTS services into a real-time pipeline and handles audio transport, turn-taking, and interruption detection.

Before you begin

Before you can use Deepgram, you need to create a Deepgram account. Signup is free and includes $200 in credit.

Daily is the WebRTC transport layer that handles audio between the browser and your agent. See the Pipecat Daily transport guide for more.

You need:

Install or update the Pipecat CLI:

$uv tool install pipecat-ai-cli

To update the CLI use:

$uv tool update pipecat-ai-cli

Choose your developer experience

Creating a Pipecat + Deepgram integration can be accomplished by using several approaches. Choose the developer experience from the guides below that best fits your style. Note that all paths share the same prerequisite: the Pipecat CLI.

  1. Build with a Coding Agent
  2. Use the quickstart CLI command
  3. Scaffold a new Pipecat project with the CLI

Build with a Coding Agent

You can use AI coding tools like Claude Code or Codex to generate your Pipecat agent code. Rather than relying on the tool’s training data, you give it live context from the Pipecat documentation.

  1. Follow the Pipecat getting started guide to set up AI tools, connect the Pipecat Context Hub, and initialize a project.
  2. Start a coding session with a prompt like the example below.
I'm building a phone assistant for my flower shop, Field & Flower, that
takes customer orders.
The bot should be able to:
- list the available bouquets
- check if a specific flower is in stock
- add a flower to the order
- get a summary of the order
- set the delivery details
- place the order
- end the call
When the call starts, the bot greets the caller with exactly:
"This is Field & Flower, your local flower shop. How can I help you today?"
Services:
- Twilio for phone calls
- STT: Deepgram
- LLM: OpenAI
- TTS: Deepgram Flux
- Deploy to Pipecat Cloud
This is a demo: use a mock backend for the flower data, and "place the
order" only needs to log the order.

The init command creates a GETTING_STARTED.md file with additional guidance for your coding agent.

Use the quickstart CLI command

The quickstart uses Deepgram for STT but Cartesia for TTS. Follow the instruction from the Pipecat Quickstart documentation, then switch to Deepgram using the steps below.

To switch TTS to Deepgram, open bot.py and find the Cartesia TTS setup:

1# Remove this:
2from pipecat.services.cartesia.tts import CartesiaTTSService
3
4tts = CartesiaTTSService(
5 api_key=os.getenv("CARTESIA_API_KEY"),
6 settings=CartesiaTTSService.Settings(
7 voice="71a7ad14-091c-4e8e-a314-022ece01c121",
8 ),
9)

Replace it with:

1from pipecat.services.deepgram.tts import DeepgramTTSService
2
3tts = DeepgramTTSService(
4 api_key=os.getenv("DEEPGRAM_API_KEY"),
5 settings=DeepgramTTSService.Settings(
6 voice="aura-2-helena-en",
7 ),
8)

You can also remove CARTESIA_API_KEY from your .env file since it is no longer needed. No other changes are required. The STT service already uses Deepgram and the rest of the pipeline stays the same.

Continue building by adding a Pipecat Client

Use Flux for turn detection

Flux is Deepgram’s conversational STT model with built-in turn detection. It uses acoustic and semantic cues to determine when a speaker has finished their turn, resulting in more natural conversations.

To use Flux, replace DeepgramSTTService with DeepgramFluxSTTService in your bot.py:

1import os
2
3from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService
4
5stt = DeepgramFluxSTTService(
6 api_key=os.getenv("DEEPGRAM_API_KEY"),
7 settings=DeepgramFluxSTTService.Settings(
8 min_confidence=0.3,
9 ),
10)

Since Deepgram Flux provides its own user turn start and end detection, you should use ExternalUserTurnStrategies to let Flux handle turn management. See User Turn Strategies for configuration details.

1from pipecat.audio.vad.silero import SileroVADAnalyzer
2from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
3from pipecat.processors.aggregators.llm_response_universal import (
4 LLMContextAggregatorPair,
5 LLMUserAggregatorParams,
6)
7
8user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
9 context, # the existing LLMContext from your bot.py
10 user_params=LLMUserAggregatorParams(
11 user_turn_strategies=ExternalUserTurnStrategies(),
12 vad_analyzer=SileroVADAnalyzer(),
13 ),
14)

Scaffold a new Pipecat project with the CLI

Step 1: Create the project

Scaffold a new project using the Pipecat CLI.

$pipecat init
$
$Project directory [pipecat-bot]: [enter]
$ Wrote pipecat-bot/AGENTS.md
$ Wrote pipecat-bot/GETTING_STARTED.md
$ Wrote pipecat-bot/CLAUDE.md
$cd pipecat-bot
$
$pipecat create --name pipecat-deepgram --bot-type web --transport daily --mode cascade --stt deepgram_flux_stt --llm openai_llm --tts deepgram_tts --no-deploy-to-cloud

Step 2: Install dependencies

Navigate to the server directory inside your new project, create a virtual environment, and install the dependencies:

$cd pipecat-deepgram/server
$uv sync

Step 3: Configure your environment

Copy the example environment file and fill in your API keys and set the default Deepgram voice:

$cp .env.example .env

Replace the placeholder values with your API keys:

The remaining values are defaults you can change later.

Step 4: Run the agent

Start the bot from the server directory:

$uv run python bot.py --transport daily

Step 5: Test the conversation

Open the local URL printed in your terminal, then:

  1. Select Daily from the Transport list and click Connect.
  2. Allow microphone access and speak to your agent.
  3. Ask a question and confirm the agent responds with speech.
  4. Speak while the agent is talking — it should stop and listen.
  5. Pause after speaking — the agent should detect the end of your turn and respond.

Continue building by adding a Pipecat Client

Next Steps

Continue building with an agent.

Follow the Pipecat getting started guide and ready the Pipecat Context Hub.

Prompt your agent to add a Pipecat client framework.

Example prompt:

Add a pipecat client for React

Go further with Deepgram

  • Voices — Deepgram offers 60+ voices across seven languages. Browse the voice library and update DEEPGRAM_VOICE_ID in your .env file.
  • Keyterm prompting — Improve recognition of domain-specific vocabulary by passing keyterms to Nova-3 via the STT service settings.
  • Speaker diarization — Assign a speaker identifier to each word in the transcript using diarization via the STT service settings.
  • Dynamic STT settings — Pipecat supports updating Deepgram STT settings without reconnecting. See the Pipecat Deepgram STT guide for details.

Resources