For more information and to use our reference implementation, visit the Deepgram Inbound Telephony Agent repo.
A reference implementation for building a secure inbound telephony voice agent using Deepgram’s Voice Agent API and Twilio. Uses Deepgram Flux for speech-to-text with native turn-taking optimized for real-time voice agent conversations. Includes webhook endpoint protection and Twilio request signature validation out of the box.
Callers dial a phone number and talk to an AI receptionist that can check appointment availability, book appointments, look up existing appointments, and cancel appointments, all through natural voice conversation. While this example is specific to a dental office, it’s a great jumping off point for any inbound use case – simply edit the prompts and function calls accordingly.
Single WebSocket bridge: The core of the system is VoiceAgentSession, which bridges two WebSocket connections (one to Twilio, one to Deepgram). It translates between Twilio’s JSON-based protocol and Deepgram’s binary audio protocol.
Barge-in: When the Deepgram Voice Agent detects that the user started speaking, the server sends a Twilio “clear” event to immediately stop playing agent audio. This prevents the agent from talking over the caller.
Function calls: The Deepgram Voice Agent API supports tool use. When the LLM decides to call a function (like checking appointment availability), Deepgram sends a function call event. The server executes it against the backend service and sends the result back to Deepgram, which incorporates it into the agent’s next response.
Protocol-agnostic server: The server doesn’t know whether audio comes from a real Twilio call or from the local development client (dev_client.py). Both send identical WebSocket messages. This means you can develop and test without a phone or Twilio account.
For a deeper look at the call flow, session lifecycle, and component details, see docs/ARCHITECTURE.md.
The fastest path to a working telephony voice agent is the setup wizard, which configures Twilio and deploys to Fly.io.
flyctl auth login).
Note: Twilio trial accounts play a short disclaimer before connecting callers.
Edit .env and add your Deepgram API key:
The wizard will:
When it’s done, call your phone number and talk to your agent.
After deployment, view your application logs with:
The app name is shown in the setup wizard output and in python setup.py --status.
The default configuration uses Fly.io’s suspend mode. Your server suspends when idle and wakes in 1-3 seconds on incoming requests.
If you want instant response times for demos or production use, set min_machines_running = 1 in fly.toml and redeploy to keep one VM always warm:
See Fly.io pricing for details.
Depending on your region, Twilio may require address verification before you can purchase a phone number. The setup wizard will surface any errors from Twilio. Follow the instructions in the Twilio console if prompted.
If you prefer to run the server locally instead of deploying to Fly.io, you can use a tunnel to expose your locally running voice agent server via a public URL.
Copy the public URL (e.g., https://xxxx.ngrok.io).
Add to your .env:
You can use the setup wizard with --twilio-only to handle Twilio configuration:
Or configure manually:
The server logs will show the call connecting in your terminal.
You can test the full voice agent conversation without a phone or Twilio account using dev_client.py, which connects to the server over WebSocket and streams audio from your microphone.
Speak into your microphone to have a conversation with the agent. See docs/LOCAL_DEVELOPMENT.md for more details and troubleshooting.
Note:
dev_client.pyis for local development only. If you’ve deployed to Fly.io, call the phone number directly instead.
Edit the SYSTEM_PROMPT in voice_agent/agent_config.py. This is where you define who the agent is, what it knows, and how it behaves. See docs/PROMPT_GUIDE.md for voice-specific prompt best practices.
voice_agent/agent_config.py (in the FUNCTIONS list)voice_agent/function_handlers.pybackend/scheduling_service.pySee docs/FUNCTION_GUIDE.md for function definition best practices.
Set environment variables in .env:
This reference implementation is configured for English using Deepgram Flux (flux-general-en) for STT, which provides native turn-taking optimized for voice agents. To add multilingual support, switch the STT model to flux-general-multi and set language_hints to an array of BCP-47 codes to bias toward expected languages. For building voice agents in other languages, see the Deepgram multilingual voice agent guide.
The backend/ directory contains an in-memory mock. To connect to a real scheduling system:
scheduling_service.pyThe function dispatch in voice_agent/function_handlers.py uses lazy imports, making the boundary between voice agent and backend explicit.