Twilio and Deepgram STT
A starter server and a self-hosted solution for integrating speech-to-text with Twilio and Deepgram.
A starter server and a self-hosted solution for integrating speech-to-text with Twilio and Deepgram.
Integrate Twilio with Deepgram for real-time speech recognition in Programmable Voice calls.
You’ll need:
Choose from two integration options:
Starter Server - Python or Node scripts that proxy audio between Twilio and Deepgram. View on GitHub.
Docker Image - Production-ready deepgram/twilio-proxy:beta for self-hosted deployments. Contact your Account Executive for access.
The code for the starter server can be accessed in this GitHub repo.
For our starter server, we offer two scripts that can work as proxy servers to help Twilio and Deepgram share data.
twilio-proxy-mono: Runs the proxy server for the inbound Twilio track, which represents the audio Twilio receives from the call. To learn more about Twilio tracks, see Twilio’s track documentation.
twilio-proxy-stereo: Runs the proxy server for both the inbound and outbound Twilio tracks, which represent the audio Twilio received from the call and the audio generated by Twilio to the call. To learn more about Twilio tracks, see Twilio’s track documentation.
Dependencies:
websockets and pydubwsPrior to running the scripts, you must replace the authentication with your Deepgram username and password.
To run the scripts:
Clone the GitHub repo to your local machine.
From the command line, navigate to the cloned repository.
Run the script using one of the following commands:
Mono
Stereo
Finally, you will need to forward data to the proxy scripts. You can do this by configuring Twilio to send WebSockets data to the server running the proxy scripts or by initiating a call between two people and directly forwarding the data to the proxy scripts.
See the Start Streaming Audio section of Twilio’s tutorial: “Consume a real-time Media Stream using WebSockets, Python, and Flask”. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.
When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the twilio-proxy-mono or twilio-proxy-stereo app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
Sample TwiML Bin files are as follows:
Mono
Stereo
For stereo, an additional track parameter exists.
Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream:
TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN with your Twilio account information.url variable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number.to and from_ variables with person A’s phone number, and your Twilio voice number, respectively.We provide the Docker image deepgram/twilio-proxy:beta, which you can request from your Account Executive.
This solution is for Deepgram’s self-hosted customers. Please contact us if you would like to learn more about our self-hosted solutions.
Use Docker Compose to run the Twilio-proxy server. Choose the configuration that matches your deployment model.
This sample includes:
api and engine sections - fill these using the deepgram-self-hosted repository template. See Self-Hosted Introduction for details.deepgram/actix-ws-echo:beta - a WebSockets echo server for viewing streaming responses. Replace with your own callback URL as needed.Environment Variables:
This sample uses deepgram/actix-ws-echo:beta as a WebSockets echo server for viewing responses. Replace with your own callback URL as needed.
Configure Twilio to send audio to your proxy server using one of these methods:
Configure Twilio to forward data to the proxy server. To do this, see the Start Streaming Audio section of Twilio’s tutorial: Consume a real-time Media Stream using WebSockets, Python, and Flask. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.
When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the Twilio-Deepgram proxy app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
Sample TwiML Bin files are as follows:
Mono
Stereo
For stereo, an additional track parameter exists.
Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream.py:
TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN with your Twilio account information.url variable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number.to and from_ variables with person A’s phone number, and your Twilio voice number, respectively.What’s Next