For more information and to use our reference implementation, visit the Deepgram Outbound Telephony Agent repo.
A reference implementation for building an outbound telephony voice agent using Deepgram’s Voice Agent API and Twilio. Uses Deepgram Flux for speech-to-text with native turn-taking optimized for real-time voice agent conversations. Includes endpoint authentication and answering machine detection (AMD) with voicemail delivery.
The server initiates outbound calls via a REST API. When the call connects, an AI voice agent follows up on a homeowners insurance quote request: it verifies lead information, gathers additional details, and schedules a consultation with a licensed agent. If voicemail is detected, it delivers a personalized message instead. While this agent is specific to a homeowners insurance use case, it’s a great jumping off point for any outbound telephony use case – simply replace the prompt and function calls with your relevant details.
POST /make-call with a phone number and lead context<Connect><Stream> pointing back to its own WebSocket endpoint/amd-resultWhen the call connects, the server doesn’t know if it’s a human or voicemail. It buffers incoming audio while Twilio’s AMD runs (~2-4 seconds). When the result arrives:
Outbound call initiation: The server actively places calls via the Twilio REST API with inline TwiML. This is the inverse of inbound, where the server passively receives calls. The server still needs a public URL because Twilio opens a WebSocket back to stream audio.
Single WebSocket bridge: The core of the system is VoiceAgentSession, which bridges two WebSocket connections (one to Twilio, one to Deepgram). It translates between Twilio’s JSON-based protocol and Deepgram’s binary audio protocol.
Lead context injection: The agent’s system prompt is built dynamically with lead data from the POST /make-call request. The agent knows the caller’s name, property details, and quote request before the conversation starts.
Barge-in: When the Deepgram Voice Agent detects that the user started speaking, the server sends a Twilio “clear” event to immediately stop playing agent audio.
Function calls: The Deepgram Voice Agent API supports tool use. The agent can check consultation availability, book appointments, and post structured call outcomes back to a CRM-like backend.
Structured call outcomes: The update_lead function captures the full call outcome — disposition, verified info, new info gathered, and a natural language summary — and logs it to the console. In production, this payload would go to a CRM, webhook, or database.
Answering machine detection: Twilio’s async AMD detects whether a human or voicemail answered. The session buffers audio until the result arrives, then branches accordingly. If AMD detects a machine after the voice agent has already started (late detection), the session tears down the Deepgram connection and switches to voicemail delivery mid-call.
Silence detection: A silence monitor tracks whether the caller is responding. If the caller goes silent for 60 seconds, the agent prompts them (“Are you still there?”) and eventually ends the call. Uses Deepgram’s InjectAgentMessage to make the agent speak each prompt naturally.
For a deeper look at the call flow, session lifecycle, and component details, see docs/ARCHITECTURE.md.
The fastest path to a working outbound voice agent is the setup wizard, which configures Twilio and deploys to Fly.io.
flyctl auth login).
Note: Twilio trial accounts play a short disclaimer before connecting callers.
Edit .env and add your Deepgram API key:
The wizard will:
/make-call endpointYour phone rings. The voice agent runs through the insurance lead follow-up conversation. Check the server logs for the full conversation transcript and structured call outcome.
After deployment, view your application logs with:
The app name is shown in the setup wizard output and in python setup.py --status.
The default configuration uses Fly.io’s suspend mode. Your server suspends when idle and wakes in 1-3 seconds on incoming requests.
If you want instant response times for demos or production use, set min_machines_running = 1 in fly.toml and redeploy to keep one VM always warm:
See Fly.io pricing for details.
Depending on your region, Twilio may require address verification before you can purchase a phone number. The setup wizard will surface any errors from Twilio. Follow the instructions in the Twilio console if prompted.
If you prefer to run the server locally instead of deploying to Fly.io, you can use a tunnel to expose your locally running voice agent server via a public URL.
Copy the public URL (e.g., https://xxxx.ngrok.io).
Add to your .env:
Or use the setup wizard with --twilio-only to handle Twilio configuration:
See docs/LOCAL_DEVELOPMENT.md for the full local development guide.
make_call.pyThe make_call.py CLI script is the primary way to trigger outbound calls during development.
The script reads SERVER_EXTERNAL_URL and ENDPOINT_SECRET from .env automatically.
After the call, the server logs show the structured call outcome:
Edit _build_system_prompt() in voice_agent/agent_config.py. The prompt is built dynamically with lead context data, so the agent knows who it’s calling and why. See docs/PROMPT_GUIDE.md for voice-specific prompt best practices.
voice_agent/agent_config.py (in the FUNCTIONS list)voice_agent/function_handlers.pybackend/lead_service.pySee docs/FUNCTION_GUIDE.md for function definition best practices.
Set environment variables in .env:
For Fly.io deployments, run python setup.py --redeployto sync
any .env changes to the deployed app.
For available LLM models, see the Voice Agent LLM docs. For available TTS models, see the Voice Agent TTS docs.
This reference implementation is configured for English using Deepgram Flux (flux-general-en) for STT, which provides native turn-taking optimized for voice agents. To add multilingual support, switch the STT model to flux-general-multi and set language_hints to an array of BCP-47 codes to bias toward expected languages. For building voice agents in other languages, see the Deepgram multilingual voice agent guide.
The backend/ directory contains an in-memory mock. To connect to a real CRM or scheduling system:
lead_service.pyThe function dispatch in voice_agent/function_handlers.py uses lazy imports, making the boundary between voice agent and backend explicit.
Edit _build_voicemail_text() in voice_agent/voicemail.py. The message is personalized with the lead’s first name from the lead context.
This reference implementation is a technical reference, not a compliance reference. Outbound voice calling is often subject to regulations (e.g. TCPA or equivalent regulations in other jurisdictions).
This implementation includes a few compliance-aware design choices:
These are merely examples, they do not constitute actual compliance guidance.
Developers deploying outbound voice agents should consult appropriate legal and/or regulatory guidance to ensure compliance with applicable regulations.