Build an Inbound Telephony Agent
For more information and to use our reference implementation, visit the Deepgram Inbound Telephony Agent repo.
Inbound Telephony Voice Agent
A reference implementation for building a secure inbound telephony voice agent using Deepgramās Voice Agent API and Twilio. Uses Deepgram Flux for speech-to-text with native turn-taking optimized for real-time voice agent conversations. Includes webhook endpoint protection and Twilio request signature validation out of the box.
Callers dial a phone number and talk to an AI receptionist that can check appointment availability, book appointments, look up existing appointments, and cancel appointments, all through natural voice conversation. While this example is specific to a dental office, itās a great jumping off point for any inbound use case ā simply edit the prompts and function calls accordingly.
Architecture
Audio Flow
- Caller speaks into their phone
- Twilio captures the audio and streams it as base64-encoded mulaw over WebSocket
- The application server decodes the base64 and sends raw mulaw bytes to Deepgramās Voice Agent API
- Deepgram handles the full pipeline: speech-to-text, LLM reasoning, text-to-speech
- Deepgram sends back raw mulaw audio bytes
- The application server encodes to base64 and sends as JSON to Twilio
- Twilio plays the audio to the caller
Key Technical Concepts
Single WebSocket bridge: The core of the system is VoiceAgentSession, which bridges two WebSocket connections (one to Twilio, one to Deepgram). It translates between Twilioās JSON-based protocol and Deepgramās binary audio protocol.
Barge-in: When the Deepgram Voice Agent detects that the user started speaking, the server sends a Twilio āclearā event to immediately stop playing agent audio. This prevents the agent from talking over the caller.
Function calls: The Deepgram Voice Agent API supports tool use. When the LLM decides to call a function (like checking appointment availability), Deepgram sends a function call event. The server executes it against the backend service and sends the result back to Deepgram, which incorporates it into the agentās next response.
Protocol-agnostic server: The server doesnāt know whether audio comes from a real Twilio call or from the local development client (dev_client.py). Both send identical WebSocket messages. This means you can develop and test without a phone or Twilio account.
For a deeper look at the call flow, session lifecycle, and component details, see docs/ARCHITECTURE.md.
Quick Start
The fastest path to a working telephony voice agent is the setup wizard, which configures Twilio and deploys to Fly.io.
Prerequisites
- Python 3.12+
- A Deepgram account and API key
- $200 free credits, no credit card required
- A Twilio account
- New accounts come with trial credits.
- A Fly.io account and flyctl installed and authenticated (
flyctl auth login).- Fly.ioās free allowance is more than enough for this reference implementation with the default suspend-on-idle configuration.
Note: Twilio trial accounts play a short disclaimer before connecting callers.
1. Clone and Install
2. Configure
Edit .env and add your Deepgram API key:
3. Run the Setup Wizard
The wizard will:
- Prompt for your Twilio account credentials (from console.twilio.com)
- Let you pick an existing phone number or purchase a new one
- Deploy your voice agent to Fly.io
- Automatically configure Twilio to route calls to your deployed voice agent
When itās done, call your phone number and talk to your agent.
Viewing Logs
After deployment, view your application logs with:
The app name is shown in the setup wizard output and in python setup.py --status.
Cold Starts
The default configuration uses Fly.ioās suspend mode. Your server suspends when idle and wakes in 1-3 seconds on incoming requests.
If you want instant response times for demos or production use, set min_machines_running = 1 in fly.toml and redeploy to keep one VM always warm:
See Fly.io pricing for details.
Twilio Regulatory Requirements
Depending on your region, Twilio may require address verification before you can purchase a phone number. The setup wizard will surface any errors from Twilio. Follow the instructions in the Twilio console if prompted.
Alternative: Tunnel + Twilio
If you prefer to run the server locally instead of deploying to Fly.io, you can use a tunnel to expose your locally running voice agent server via a public URL.
1. Start a Tunnel
Copy the public URL (e.g., https://xxxx.ngrok.io).
2. Update Configuration
Add to your .env:
3. Start the Server
4. Configure Twilio
You can use the setup wizard with --twilio-only to handle Twilio configuration:
Or configure manually:
- Get a Twilio phone number at twilio.com/console
- In the phone number settings, set the webhook for incoming calls to:
Method: HTTP POST
- Call your Twilio number from any phone
The server logs will show the call connecting in your terminal.
Local Development (No Twilio Required)
You can test the full voice agent conversation without a phone or Twilio account using dev_client.py, which connects to the server over WebSocket and streams audio from your microphone.
Speak into your microphone to have a conversation with the agent. See docs/LOCAL_DEVELOPMENT.md for more details and troubleshooting.
Note:
dev_client.pyis for local development only. If youāve deployed to Fly.io, call the phone number directly instead.
Example Conversation
Project Structure
Customization
Change the Agentās Personality
Edit the SYSTEM_PROMPT in voice_agent/agent_config.py. This is where you define who the agent is, what it knows, and how it behaves. See docs/PROMPT_GUIDE.md for voice-specific prompt best practices.
Add or Modify Functions
- Define the function in
voice_agent/agent_config.py(in theFUNCTIONSlist) - Add the handler in
voice_agent/function_handlers.py - Implement the backend logic in
backend/scheduling_service.py
See docs/FUNCTION_GUIDE.md for function definition best practices.
Swap the LLM or Voice
Set environment variables in .env:
Multilingual Support
This reference implementation is configured for English using Deepgram Flux (flux-general-en) for STT, which provides native turn-taking optimized for voice agents. For building voice agents in other languages, see the Deepgram multilingual voice agent guide.
Replace the Mock Backend
The backend/ directory contains an in-memory mock. To connect to a real scheduling system:
- Keep the same method signatures in
scheduling_service.py - Replace the method bodies with HTTP calls to your real API
- The voice agent layer doesnāt need to change
The function dispatch in voice_agent/function_handlers.py uses lazy imports, making the boundary between voice agent and backend explicit.
Additional Resources
- docs/ARCHITECTURE.md - Detailed architecture, data flow diagrams, and component details
- docs/PROMPT_GUIDE.md - Best practices for writing voice agent prompts
- docs/FUNCTION_GUIDE.md - Best practices for defining agent functions
- docs/LOCAL_DEVELOPMENT.md - Step-by-step local development setup
- Deepgram Voice Agent API Docs - Official API documentation