Build a Multi-Agent Architecture
Create a multi-agent architecture using Deepgramβs Voice Agent API, where specialized agents handle different phases of customer interactions through seamless handoffs.
For more information and to use our reference implementation, visit the Deepgram Multi-Agent repo.
Why Multi-Agent Architecture?
Traditional single-agent voice systems face fundamental limitations as complexity grows. This implementation demonstrates how to overcome these challenges by treating the conversation as a sequence of specialized phases/states, where each phase has its own:
- System instructions (the agentβs prompt)
- Transformed context (summarized from previous conversation)
- Specific tools (2-4 focused functions per agent)
This approach solves critical problems:
- Context Management: Each agent starts fresh rather than accumulating entire conversation history
- Focused Responsibility: Agents excel at their specific task instead of juggling everything
- Better Reliability: Fewer functions per agent means clearer decision-making for the LLM
- Easier Debugging: Issues are isolated to specific agents and transitions
Architecture Overview
Key Architecture Points:
- Persistent: Twilio WebSocket and audio forwarding task remain active throughout
- Ephemeral: Voice Agent sessions are created/destroyed per agent
- Orchestration: CallOrchestrator manages all transitions and state
- Context Transfer: A small and fast LLM (via Groq API) summarizes conversation between agents
Key Technical Concepts
Session-Based Agent Switching
This implementation creates separate Voice Agent sessions for each specialized agent. The CallOrchestrator (orchestrator/call_orchestrator.py) manages these transitions while maintaining the Twilio connection.
Audio Task Persistence Pattern
A critical implementation detail: the audio forwarding task starts once and persists throughout all agent transitions. This ensures the Twilio WebSocket remains active while agents switch:
Context Manager Lifecycle
Voice Agent connections are async context managers that require manual lifecycle management:
Function Call Response Pattern
Functions must be called with proper timing to maintain conversation flow:
The Three Agents
Each agent represents a distinct conversation phase with focused responsibilities:
1. Qualifier Agent (agents/qualifier/config.py)
- Purpose: Initial contact and lead qualification
- Functions:
handoff_to_next_agent,end_conversation - Collects: Name, location, specific needs
2. Advisor Agent (agents/advisor/config.py)
- Purpose: Provide consultation and recommendations
- Functions:
handoff_to_next_agent,end_conversation - Context: Receives summary from qualifier
3. Closer Agent (agents/closer/config.py)
- Purpose: Schedule follow-up and gather feedback
- Functions:
schedule_followup,record_satisfaction,end_conversation - Context: Receives summary from advisor
Implementation Details
How Agent Transitions Work
When an agent calls handoff_to_next_agent, the orchestrator:
- Summarizes the current conversation using Groq AI
- Closes the current Voice Agent session (but keeps audio task running)
- Starts a new Voice Agent session with the summarized context
- Continues audio forwarding to the new agent seamlessly
Creating an Agent Configuration
Each agent is configured with settings for STT, LLM, TTS, and functions:
Function Definitions
Functions include detailed descriptions to guide the LLMβs behavior. See agents/shared/functions.py for examples. For comprehensive function definition best practices, refer to docs/FUNCTION_GUIDE.md.
Key pattern: Functions should wait for customer confirmation:
Quick Start
Prerequisites
- Python 3.8+
- Twilio account with phone number (with outbound calling enabled)
- Deepgram API key
- Groq API key (free tier available at https://console.groq.com/)
- Public tunnel (ngrok, zrok, etc.)
1. Install Dependencies
This implementation uses the Deepgram Python SDK:
2. Configure Environment
Copy .env.example to .env and add your API keys:
The key environment variables (config.py manages these):
DEEPGRAM_API_KEY- Your Deepgram API keyGROQ_API_KEY- Groq API key for conversation summarizationTWILIO_ACCOUNT_SID,TWILIO_AUTH_TOKEN,TWILIO_PHONE_NUMBER- Twilio credentialsLEAD_SERVER_EXTERNAL_URL- Your public tunnel URLLEAD_PHONE_NUMBER- Phone number to call
3. Start Public Tunnel
Copy the public URL and update LEAD_SERVER_EXTERNAL_URL in your .env file.
4. Run the System
The system will:
- Start a WebSocket server on port 8000
- Initiate an outbound call to
LEAD_PHONE_NUMBER - Begin the conversation with the Qualifier agent
Example Conversation Flow
Hereβs a simplified flow showing agent transitions and function calls:
Phase 1: Qualifier Agent
Transition
- Groq summarizes: βJohn Smith from Seattle needs retirement planning adviceβ
- Qualifier session closes
- Advisor session starts with context
Phase 2: Advisor Agent
Transition
- Groq summarizes: βJohn Smith from Seattle would like to schedule a formal consultationβ
- Advisor session closes
- Closer session starts with context
Phase 3: Closer Agent
Project Structure
Customization
Modify Agent Behavior
Edit the prompts in each agentβs config file:
Qualifier (agents/qualifier/config.py):
- Change greeting message
- Adjust information gathering flow
- Modify qualification criteria
Advisor (agents/advisor/config.py):
- Customize consultation approach
- Change expertise area (retirement, investments, etc.)
- Adjust handoff triggers
Closer (agents/closer/config.py):
- Modify scheduling questions
- Change satisfaction survey format
- Customize closing message
Important: See docs/PROMPT_GUIDE.md for voice-specific prompt engineering best practices.
Add New Agents
Quick overview:
- Create
agents/your_agent/config.pywith agent configuration - Update
orchestrator/call_orchestrator.pytransition logic - Add agent-specific function handlers if needed
Change Summarization
Edit utils/context_summarizer.py to:
- Switch LLM models (currently Llama 3.3 70B on Groq). See other available Groq models.
- Modify summarization prompts for better context extraction
- Change which data points are captured
Additional Resources
- docs/PROMPT_GUIDE.md - Voice agent prompt engineering best practices
- docs/FUNCTION_GUIDE.md - Function definition best practices