Twilio and Deepgram STT
A starter server and a self-hosted solution for integrating speech-to-text with Twilio and Deepgram.
Integrate Twilio with Deepgram for real-time speech recognition in Programmable Voice calls.
Before You Begin
You’ll need:
- A Deepgram account with an API key (free signup includes $200 credit)
- A Twilio account with a phone number that has Voice functionality
Solutions
Choose from two integration options:
-
Starter Server - Python or Node scripts that proxy audio between Twilio and Deepgram. View on GitHub.
-
Docker Image - Production-ready
deepgram/twilio-proxy:betafor self-hosted deployments. Contact your Account Executive for access.
Starter Server
The code for the starter server can be accessed in this GitHub repo.
For our starter server, we offer two scripts that can work as proxy servers to help Twilio and Deepgram share data.
-
twilio-proxy-mono: Runs the proxy server for the inbound Twilio track, which represents the audio Twilio receives from the call. To learn more about Twilio tracks, see Twilio’s track documentation. -
twilio-proxy-stereo: Runs the proxy server for both the inbound and outbound Twilio tracks, which represent the audio Twilio received from the call and the audio generated by Twilio to the call. To learn more about Twilio tracks, see Twilio’s track documentation.
Setup
Dependencies:
- Python: Python 3.6+ with
websocketsandpydub - Node: Node.js with
ws
Configure the Scripts
Prior to running the scripts, you must replace the authentication with your Deepgram username and password.
Run the Scripts
To run the scripts:
-
Clone the GitHub repo to your local machine.
-
From the command line, navigate to the cloned repository.
-
Run the script using one of the following commands:
Mono
Stereo
Forward Data to Proxy
Finally, you will need to forward data to the proxy scripts. You can do this by configuring Twilio to send WebSockets data to the server running the proxy scripts or by initiating a call between two people and directly forwarding the data to the proxy scripts.
Configure Twilio to Use WebSockets
See the Start Streaming Audio section of Twilio’s tutorial: “Consume a real-time Media Stream using WebSockets, Python, and Flask”. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.
When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the twilio-proxy-mono or twilio-proxy-stereo app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
Sample TwiML Bin files are as follows:
Mono
Stereo
For stereo, an additional track parameter exists.
Send Call Data to Proxy Scripts
Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream:
- Be sure to replace
TWILIO_ACCOUNT_SIDandTWILIO_AUTH_TOKENwith your Twilio account information. - Replace the
urlvariable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number. - Replace the
toandfrom_variables with person A’s phone number, and your Twilio voice number, respectively.
Docker Image
We provide the Docker image deepgram/twilio-proxy:beta, which you can request from your Account Executive.
This solution is for Deepgram’s self-hosted customers. Please contact us if you would like to learn more about our self-hosted solutions.
Run the Docker Image
Use Docker Compose to run the Twilio-proxy server. Choose the configuration that matches your deployment model.
Self-Hosted
This sample includes:
- Stubbed
apiandenginesections - fill these using thedeepgram-self-hostedrepository template. See Self-Hosted Introduction for details. deepgram/actix-ws-echo:beta- a WebSockets echo server for viewing streaming responses. Replace with your own callback URL as needed.
Environment Variables:
Hosted
This sample uses deepgram/actix-ws-echo:beta as a WebSockets echo server for viewing responses. Replace with your own callback URL as needed.
Forward Data to Proxy
Configure Twilio to send audio to your proxy server using one of these methods:
Option 1: TwiML Bins
Configure Twilio to forward data to the proxy server. To do this, see the Start Streaming Audio section of Twilio’s tutorial: Consume a real-time Media Stream using WebSockets, Python, and Flask. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.
When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the Twilio-Deepgram proxy app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
Sample TwiML Bin files are as follows:
Mono
Stereo
For stereo, an additional track parameter exists.
Send Call Data to Twilio-Deepgram Proxy
Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream.py:
- Be sure to replace
TWILIO_ACCOUNT_SIDandTWILIO_AUTH_TOKENwith your Twilio account information. - Replace the
urlvariable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number. - Replace the
toandfrom_variables with person A’s phone number, and your Twilio voice number, respectively.
What’s Next