Build an Outbound Telephony Agent
For more information and to use our reference implementation, visit the Deepgram Outbound Telephony Agent repo.
Outbound Telephony Voice Agent
A reference implementation for building an outbound telephony voice agent using Deepgramβs Voice Agent API and Twilio. Uses Deepgram Flux for speech-to-text with native turn-taking optimized for real-time voice agent conversations. Includes endpoint authentication and answering machine detection (AMD) with voicemail delivery.
The server initiates outbound calls via a REST API. When the call connects, an AI voice agent follows up on a homeowners insurance quote request: it verifies lead information, gathers additional details, and schedules a consultation with a licensed agent. If voicemail is detected, it delivers a personalized message instead. While this agent is specific to a homeowners insurance use case, itβs a great jumping off point for any outbound telephony use case β simply replace the prompt and function calls with your relevant details.
Architecture
Call Initiation Flow
- An external system (CRM, CLI script, webhook) sends
POST /make-callwith a phone number and lead context - The server calls the Twilio REST API with inline TwiML containing
<Connect><Stream>pointing back to its own WebSocket endpoint - Twilio dials the recipientβs phone and opens a WebSocket back to the server to stream audio
- Twilioβs async AMD runs in the background and POSTs the result to
/amd-result
Audio Flow (Human Path)
- Recipient speaks into their phone
- Twilio captures the audio and streams it as base64-encoded mulaw over WebSocket
- The application server decodes the base64 and sends raw mulaw bytes to Deepgramβs Voice Agent API
- Deepgram handles the full pipeline: speech-to-text, LLM reasoning, text-to-speech
- Deepgram sends back raw mulaw audio bytes
- The application server encodes to base64 and sends as JSON to Twilio
- Twilio plays the audio to the recipient
AMD + Voicemail Flow
When the call connects, the server doesnβt know if itβs a human or voicemail. It buffers incoming audio while Twilioβs AMD runs (~2-4 seconds). When the result arrives:
- Human: Connect to Deepgram Voice Agent API, flush buffered audio, start conversation
- Voicemail: Deliver a personalized message via Deepgram Aura-2 TTS, then hang up
Key Technical Concepts
-
Outbound call initiation: The server actively places calls via the Twilio REST API with inline TwiML. This is the inverse of inbound, where the server passively receives calls. The server still needs a public URL because Twilio opens a WebSocket back to stream audio.
-
Single WebSocket bridge: The core of the system is
VoiceAgentSession, which bridges two WebSocket connections (one to Twilio, one to Deepgram). It translates between Twilioβs JSON-based protocol and Deepgramβs binary audio protocol. -
Lead context injection: The agentβs system prompt is built dynamically with lead data from the
POST /make-callrequest. The agent knows the callerβs name, property details, and quote request before the conversation starts. -
Barge-in: When the Deepgram Voice Agent detects that the user started speaking, the server sends a Twilio βclearβ event to immediately stop playing agent audio.
-
Function calls: The Deepgram Voice Agent API supports tool use. The agent can check consultation availability, book appointments, and post structured call outcomes back to a CRM-like backend.
-
Structured call outcomes: The
update_leadfunction captures the full call outcome β disposition, verified info, new info gathered, and a natural language summary β and logs it to the console. In production, this payload would go to a CRM, webhook, or database. -
Answering machine detection: Twilioβs async AMD detects whether a human or voicemail answered. The session buffers audio until the result arrives, then branches accordingly. If AMD detects a machine after the voice agent has already started (late detection), the session tears down the Deepgram connection and switches to voicemail delivery mid-call.
-
Silence detection: A silence monitor tracks whether the caller is responding. If the caller goes silent for 60 seconds, the agent prompts them (βAre you still there?β) and eventually ends the call. Uses Deepgramβs
InjectAgentMessageto make the agent speak each prompt naturally.
For a deeper look at the call flow, session lifecycle, and component details, see docs/ARCHITECTURE.md.
Quick Start
The fastest path to a working outbound voice agent is the setup wizard, which configures Twilio and deploys to Fly.io.
Prerequisites
- Python 3.12+
- A Deepgram account and API key
- $200 free credits, no credit card required
- A Twilio account
- New accounts come with trial credits.
- A Fly.io account and flyctl installed and authenticated (
flyctl auth login).- Fly.ioβs free allowance is more than enough for this reference implementation with the default suspend-on-idle configuration.
Note: Twilio trial accounts play a short disclaimer before connecting callers.
1. Clone and Install
2. Configure
Edit .env and add your Deepgram API key:
3. Run the Setup Wizard
The wizard will:
- Prompt for your Twilio account credentials (from console.twilio.com)
- Let you pick an existing phone number or purchase a new one
- Generate an endpoint secret for securing the
/make-callendpoint - Deploy your voice agent to Fly.io
4. Place a Test Call
Your phone rings. The voice agent runs through the insurance lead follow-up conversation. Check the server logs for the full conversation transcript and structured call outcome.
Viewing Logs
After deployment, view your application logs with:
The app name is shown in the setup wizard output and in python setup.py --status.
Cold Starts
The default configuration uses Fly.ioβs suspend mode. Your server suspends when idle and wakes in 1-3 seconds on incoming requests.
If you want instant response times for demos or production use, set min_machines_running = 1 in fly.toml and redeploy to keep one VM always warm:
See Fly.io pricing for details.
Twilio Regulatory Requirements
Depending on your region, Twilio may require address verification before you can purchase a phone number. The setup wizard will surface any errors from Twilio. Follow the instructions in the Twilio console if prompted.
Alternative: Tunnel + Twilio
If you prefer to run the server locally instead of deploying to Fly.io, you can use a tunnel to expose your locally running voice agent server via a public URL.
1. Start a Tunnel
Copy the public URL (e.g., https://xxxx.ngrok.io).
2. Update Configuration
Add to your .env:
3. Start the Server
4. Place a Call
Or use the setup wizard with --twilio-only to handle Twilio configuration:
See docs/LOCAL_DEVELOPMENT.md for the full local development guide.
Using make_call.py
The make_call.py CLI script is the primary way to trigger outbound calls during development.
The script reads SERVER_EXTERNAL_URL and ENDPOINT_SECRET from .env automatically.
Example Conversation
After the call, the server logs show the structured call outcome:
Project Structure
Customization
Change the Agentβs Personality
Edit _build_system_prompt() in voice_agent/agent_config.py. The prompt is built dynamically with lead context data, so the agent knows who itβs calling and why. See docs/PROMPT_GUIDE.md for voice-specific prompt best practices.
Add or Modify Functions
- Define the function in
voice_agent/agent_config.py(in theFUNCTIONSlist) - Add the handler in
voice_agent/function_handlers.py - Implement the backend logic in
backend/lead_service.py
See docs/FUNCTION_GUIDE.md for function definition best practices.
Swap the LLM or Voice
Set environment variables in .env:
For Fly.io deployments, run python setup.py --redeployto sync
any .env changes to the deployed app.
For available LLM models, see the Voice Agent LLM docs. For available TTS models, see the Voice Agent TTS docs.
Multilingual Support
This reference implementation is configured for English using Deepgram Flux (flux-general-en) for STT, which provides native turn-taking optimized for voice agents. For building voice agents in other languages, see the Deepgram multilingual voice agent guide.
Replace the Mock Backend
The backend/ directory contains an in-memory mock. To connect to a real CRM or scheduling system:
- Keep the same method signatures in
lead_service.py - Replace the method bodies with HTTP calls to your real API
- The voice agent layer doesnβt need to change
The function dispatch in voice_agent/function_handlers.py uses lazy imports, making the boundary between voice agent and backend explicit.
Customize the Voicemail Message
Edit _build_voicemail_text() in voice_agent/voicemail.py. The message is personalized with the leadβs first name from the lead context.
Compliance Note
This reference implementation is a technical reference, not a compliance reference. Outbound voice calling is often subject to regulations (e.g. TCPA or equivalent regulations in other jurisdictions).
This implementation includes a few compliance-aware design choices:
- AI disclosure at the start of every call
- Statement that the agent is not a licensed insurance agent
- The scenario assumes prior express consent (quote request submitted)
These are merely examples, they do not constitute actual compliance guidance.
Developers deploying outbound voice agents should consult appropriate legal and/or regulatory guidance to ensure compliance with applicable regulations.
Additional Resources
- docs/ARCHITECTURE.md - Detailed architecture, data flow diagrams, and component details
- docs/PROMPT_GUIDE.md - Best practices for writing voice agent prompts
- docs/FUNCTION_GUIDE.md - Best practices for defining agent functions
- docs/LOCAL_DEVELOPMENT.md - Step-by-step local development setup
- Deepgram Voice Agent API Docs - Official API documentation