Build a Multi-Agent Architecture

For more information and to use our reference implementation, visit the Deepgram Multi-Agent repo.

Why Multi-Agent Architecture?

Traditional single-agent voice systems face fundamental limitations as complexity grows. This implementation demonstrates how to overcome these challenges by treating the conversation as a sequence of specialized phases/states, where each phase has its own:

System instructions (the agent’s prompt)
Transformed context (summarized from previous conversation)
Specific tools (2-4 focused functions per agent)

This approach solves critical problems:

Context Management: Each agent starts fresh rather than accumulating entire conversation history
Focused Responsibility: Agents excel at their specific task instead of juggling everything
Better Reliability: Fewer functions per agent means clearer decision-making for the LLM
Easier Debugging: Issues are isolated to specific agents and transitions

Architecture Overview

┌─────────────────────────────────────────┐
│      Twilio Phone Call (Persistent)     │
│         WebSocket Connection            │
└──────────────────┬──────────────────────┘
                   │ Audio Stream
         ┌─────────▼──────────┐
         │  CallOrchestrator  │ ← Central coordinator
         │ ┌────────────────┐ │
         │ │ Audio Forward  │ │ ← Persistent task
         │ │     Task       │ │
         │ └────────────────┘ │
         └─────────┬──────────┘
                   │ Manages lifecycle
    ┌──────────────┼──────────────┐
    │              │              │
    ▼              ▼              ▼
┌─────────┐   ┌─────────┐   ┌─────────┐
│Qualifier│──►│ Advisor │──►│ Closer  │
│  Agent  │   │  Agent  │   │  Agent  │  ← Ephemeral Voice Agent sessions
└─────────┘   └─────────┘   └─────────┘
    │              │              │
    └──────────────┼──────────────┘
                   │
            Groq API (Llama 3.3 70B by default)
          Context Summarization

Key Architecture Points:

Persistent: Twilio WebSocket and audio forwarding task remain active throughout
Ephemeral: Voice Agent sessions are created/destroyed per agent
Orchestration: CallOrchestrator manages all transitions and state
Context Transfer: A small and fast LLM (via Groq API) summarizes conversation between agents

Key Technical Concepts

Session-Based Agent Switching

This implementation creates separate Voice Agent sessions for each specialized agent. The CallOrchestrator (orchestrator/call_orchestrator.py) manages these transitions while maintaining the Twilio connection.

Audio Task Persistence Pattern

A critical implementation detail: the audio forwarding task starts once and persists throughout all agent transitions. This ensures the Twilio WebSocket remains active while agents switch:

1 # Audio task starts ONCE and persists
2 if not self.audio_task or self.audio_task.done():
3     self.audio_task = asyncio.create_task(self.forward_twilio_audio())

Context Manager Lifecycle

Voice Agent connections are async context managers that require manual lifecycle management:

1 # Creating connection
2 self.current_agent_context = self.current_client.agent.v1.connect()
3 self.current_agent_connection = await self.current_agent_context.__aenter__()
4 
5 # Closing connection (keep audio task alive during transitions)
6 await self.close_current_agent(keep_audio_task=True)

Function Call Response Pattern

Functions must be called with proper timing to maintain conversation flow:

1 # 1. Send response to current agent
2 response = AgentV1FunctionCallResponseMessage(...)
3 await self.current_agent_connection.send_function_call_response(response)
4 
5 # 2. Brief pause for processing
6 await asyncio.sleep(0.5)
7 
8 # 3. Then perform the action (e.g., transition)
9 await self.transition_to_agent(next_agent)

The Three Agents

Each agent represents a distinct conversation phase with focused responsibilities:

1. Qualifier Agent (`agents/qualifier/config.py`)

Purpose: Initial contact and lead qualification
Functions: handoff_to_next_agent, end_conversation
Collects: Name, location, specific needs

2. Advisor Agent (`agents/advisor/config.py`)

Purpose: Provide consultation and recommendations
Functions: handoff_to_next_agent, end_conversation
Context: Receives summary from qualifier

3. Closer Agent (`agents/closer/config.py`)

Purpose: Schedule follow-up and gather feedback
Functions: schedule_followup, record_satisfaction, end_conversation
Context: Receives summary from advisor

Implementation Details

How Agent Transitions Work

When an agent calls handoff_to_next_agent, the orchestrator:

Summarizes the current conversation using Groq AI
Closes the current Voice Agent session (but keeps audio task running)
Starts a new Voice Agent session with the summarized context
Continues audio forwarding to the new agent seamlessly

Creating an Agent Configuration

Each agent is configured with settings for STT, LLM, TTS, and functions:

1 from deepgram.agent.v1.types import AgentV1Settings
2 
3 def get_qualifier_config(context: str = "") -> AgentV1Settings:
4     return AgentV1Settings(
5         audio=AgentV1SettingsAudio(...),  # Audio encoding settings
6         agent=AgentV1SettingsAgent(
7             listen=AgentV1SettingsAgentListen(...),   # Deepgram Flux STT
8             think=ThinkSettingsV1(           # LLM configuration
9                 model="gpt-4o-mini",
10                 prompt=QUALIFIER_PROMPT,
11                 functions=QUALIFIER_FUNCTIONS
12             ),
13             speak=SpeakSettingsV1(...),  # Deepgram Aura TTS
14             greeting="Hi, this is Alex..."
15         )
16     )

Function Definitions

Functions include detailed descriptions to guide the LLM’s behavior. See agents/shared/functions.py for examples. For comprehensive function definition best practices, refer to docs/FUNCTION_GUIDE.md.

Key pattern: Functions should wait for customer confirmation:

1 HANDOFF_FUNCTION = AgentV1Function(
2     name="handoff_to_next_agent",
3     description="""...
4     CORRECT PATTERN:
5     1. You ask: "I can connect you with [next agent]. Would that work?"
6     2. WAIT for customer response
7     3. Customer says: "Yes"
8     4. You IMMEDIATELY call this function WITHOUT additional text
9     """
10 )

Quick Start

Prerequisites

Python 3.8+
Twilio account with phone number (with outbound calling enabled)
Deepgram API key
Groq API key (free tier available at https://console.groq.com/)
Public tunnel (ngrok, zrok, etc.)

1. Install Dependencies

This implementation uses the Deepgram Python SDK:

$ # Create virtual environment
$ python -m venv venv
$ source venv/bin/activate  # On Windows: venv\Scripts\activate
$ 
$ # Install packages
$ pip install -r requirements.txt

2. Configure Environment

Copy .env.example to .env and add your API keys:

$ cp .env.example .env

The key environment variables (config.py manages these):

DEEPGRAM_API_KEY - Your Deepgram API key
GROQ_API_KEY - Groq API key for conversation summarization
TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER - Twilio credentials
LEAD_SERVER_EXTERNAL_URL - Your public tunnel URL
LEAD_PHONE_NUMBER - Phone number to call

3. Start Public Tunnel

$ # Using zrok (free, easy)
$ zrok share public localhost:8000
$ 
$ # Or ngrok
$ ngrok http 8000 --scheme=ws

Copy the public URL and update LEAD_SERVER_EXTERNAL_URL in your .env file.

4. Run the System

$ python main.py

The system will:

Start a WebSocket server on port 8000
Initiate an outbound call to LEAD_PHONE_NUMBER
Begin the conversation with the Qualifier agent

Example Conversation Flow

Here’s a simplified flow showing agent transitions and function calls:

Phase 1: Qualifier Agent

Agent: "Hi, this is Alex calling from our advisory services.
        Is now a good time to chat briefly?"
Customer: "Sure, I have a few minutes"
Agent: "Great! May I get your name?"
Customer: "John Smith"
Agent: "Where are you located?"
Customer: "Seattle"
Agent: "What brings you to call us today?"
Customer: "I need help with retirement planning"
Agent: "I can connect you with one of our advisors. Would that work?"
Customer: "Yes, that'd be great"
[Agent calls handoff_to_next_agent function]

Transition

Groq summarizes: “John Smith from Seattle needs retirement planning advice”
Qualifier session closes
Advisor session starts with context

Phase 2: Advisor Agent

Agent: "Hi John! I understand you're interested in retirement planning.
        How can I help?"
Customer: "I'm 45 and want to retire by 60"
Agent: "Based on your timeline, I recommend a formal consultation.
        Can I connect you with our team to schedule that?"
Customer: "Yes please"
[Agent calls handoff_to_next_agent function]

Transition

Groq summarizes: “John Smith from Seattle would like to schedule a formal consultation”
Advisor session closes
Closer session starts with context

Phase 3: Closer Agent

Agent: "Thanks for speaking with our advisor.
        When would work best for your consultation?"
Customer: "Next Wednesday afternoon"
[Agent calls schedule_followup function]
Agent: "Perfect! How would you rate your experience today from 1 to 5?"
Customer: "5"
[Agent calls record_satisfaction function]
Agent: "Thank you! Have a great day!"
[Agent calls end_conversation function]

Project Structure

deepgram-voice-agent-multi-agent-/
├── main.py                      # Entry point - starts server, initiates calls
├── config.py                    # Environment variable management
│
├── orchestrator/
│   └── call_orchestrator.py    # Core orchestration logic
│
├── agents/
│   ├── shared/
│   │   └── functions.py        # Shared function definitions
│   ├── qualifier/               # Qualifier agent configuration
│   ├── advisor/                 # Advisor agent configuration
│   └── closer/                  # Closer agent configuration
│
├── utils/
│   └── context_summarizer.py   # Groq LLM summarization
│
├── call_handling/
│   └── twilio_client.py        # Twilio API wrapper
│
└── docs/
    ├── PROMPT_GUIDE.md          # Voice agent prompt best practices
    └── FUNCTION_GUIDE.md        # Function definition best practices

Customization

Modify Agent Behavior

Edit the prompts in each agent’s config file:

Qualifier (agents/qualifier/config.py):

Change greeting message
Adjust information gathering flow
Modify qualification criteria

Advisor (agents/advisor/config.py):

Customize consultation approach
Change expertise area (retirement, investments, etc.)
Adjust handoff triggers

Closer (agents/closer/config.py):

Modify scheduling questions
Change satisfaction survey format
Customize closing message

Important: See docs/PROMPT_GUIDE.md for voice-specific prompt engineering best practices.

Add New Agents

Quick overview:

Create agents/your_agent/config.py with agent configuration
Update orchestrator/call_orchestrator.py transition logic
Add agent-specific function handlers if needed

Change Summarization

Edit utils/context_summarizer.py to:

Switch LLM models (currently Llama 3.3 70B on Groq). See other available Groq models.
Modify summarization prompts for better context extraction
Change which data points are captured

Additional Resources

docs/PROMPT_GUIDE.md - Voice agent prompt engineering best practices
docs/FUNCTION_GUIDE.md - Function definition best practices

Why Multi-Agent Architecture?

Architecture Overview

Key Technical Concepts

Session-Based Agent Switching

Audio Task Persistence Pattern

Context Manager Lifecycle

Function Call Response Pattern

The Three Agents

1. Qualifier Agent (agents/qualifier/config.py)

2. Advisor Agent (agents/advisor/config.py)

3. Closer Agent (agents/closer/config.py)

Implementation Details

How Agent Transitions Work

Creating an Agent Configuration

Function Definitions

Quick Start

Prerequisites

1. Install Dependencies

2. Configure Environment

3. Start Public Tunnel

4. Run the System

Example Conversation Flow

Phase 1: Qualifier Agent

Transition

Phase 2: Advisor Agent

Transition

Phase 3: Closer Agent

Project Structure

Customization

Modify Agent Behavior

Add New Agents

Change Summarization

Additional Resources

1. Qualifier Agent (`agents/qualifier/config.py`)

2. Advisor Agent (`agents/advisor/config.py`)

3. Closer Agent (`agents/closer/config.py`)