Twilio and Deepgram STT
A starter server and a self-hosted solution for integrating speech-to-text with Twilio and Deepgram.
Integrate Twilio with Deepgram for real-time automatic speech recognition in Programmable Voice calls.
Before you Begin
Before you can use Deepgram, you’ll need to create a Deepgram account. Signup is free and includes $200 in free credit and access to all of Deepgram’s features!
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
Solutions
To help you integrate between Twilio and Deepgram, we provide the following solutions:
-
A starter server in either Python or Node. A conversation streamed to your Twilio number will be directed to our script, which will send the audio to Deepgram, and receive and print transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
-
A Docker image (
deepgram/twilio-proxy:beta
) that fully integrates with our self-hosted products using the same robust Rust architecture that our other services use. For access to the Docker image, ask your Account Executive.
Starter Server
The code for the starter server can be accessed in this GitHub repo.
For our starter server, we offer two scripts that can work as proxy servers to help Twilio and Deepgram share data.
-
twilio-proxy-mono
: Runs the proxy server for the inbound Twilio track, which represents the audio Twilio receives from the call. To learn more about Twilio tracks, see Twilio’s track documentation. -
twilio-proxy-stereo
: Runs the proxy server for both the inbound and outbound Twilio tracks, which represent the audio Twilio received from the call and the audio generated by Twilio to the call. To learn more about Twilio tracks, see Twilio’s track documentation.
Before You Begin
Create a Deepgram Account and Get Your Deepgram API Key
Before you can use Deepgram products, you’ll need to create a Deepgram account. After you’ve signed up for your free account, create a Deepgram API Key. Make note of your API Key; you will need it later.
Create a Twilio Account
Before you can use Twilio products, you’ll need to create a Twilio account. In addition, if you don’t currently own a Twilio phone number with Voice functionality, you’ll need to purchase one.
Configure Environment
We provide sample scripts in Python and Node format and assume you have already configured a Python (3.6 or greater) or Node development environment.
Install Dependencies
For Python, we use websockets and pydub.
For Node, we use ws.
Configure the Scripts
Prior to running the scripts, you must replace the authentication with your Deepgram username and password.
Run the Scripts
To run the scripts:
-
Clone the GitHub repo to your local machine.
-
From the command line, navigate to the cloned repository.
-
Run the script using one of the following commands:
Mono
Stereo
Forward Data to Proxy
Finally, you will need to forward data to the proxy scripts. You can do this by configuring Twilio to send WebSockets data to the server running the proxy scripts or by initiating a call between two people and directly forwarding the data to the proxy scripts.
Configure Twilio to Use WebSockets
See the Start Streaming Audio section of Twilio’s tutorial: “Consume a real-time Media Stream using WebSockets, Python, and Flask”. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.
When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the twilio-proxy-mono
or twilio-proxy-stereo
app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
Sample TwiML Bin files are as follows:
Mono
Stereo
For stereo, an additional track
parameter exists.
Send Call Data to Proxy Scripts
Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream
:
- Be sure to replace
TWILIO_ACCOUNT_SID
andTWILIO_AUTH_TOKEN
with your Twilio account information. - Replace the
url
variable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number. - Replace the
to
andfrom_
variables with person A’s phone number, and your Twilio voice number, respectively.
Docker Image
We provide the Docker image deepgram/twilio-proxy:beta
, which you can request from your Account Executive.
This solution is for Deepgram’s self-hosted customers. Please contact us if you would like to learn more about our self-hosted solutions.
Before You Begin
Create a Deepgram Account
Before you can use Deepgram products, you’ll need to create a Deepgram account.
Create a Twilio Account
Before you can use Twilio products, you’ll need to create a Twilio account. In addition, if you don’t currently own a Twilio phone number with Voice functionality, you’ll need to purchase one.
Run the Docker Image
We recommend running the Twilio-proxy server in a Docker container configured using a Docker Compose file. We provide the following sample Compose files to show you how to do this for a Deepgram API using either our hosted or self-hosted deployment models.
Self-Hosted
To deploy the Twilio proxy server to a self-hosted Deepgram environment using Docker Compose, you can use the following sample Compose file.
This sample references stubbed out api
and engine
sections. Fill in these sections with the template in the deepgram-self-hosted
repository. To learn more, see the Self-Hosted Introduction and other self-hosted guides, or talk to your Account Executive.
This sample references the deepgram/actix-ws-echo:beta
Docker image, which is a WebSockets echo server. You can use it to see streaming Deepgram STT responses, or you can configure a callback URL to send Deepgram STT responses elsewhere.
The Docker Compose file references the following environment variables:
Hosted
To deploy the Twilio proxy server to a Deepgram-Hosted installation using Docker Compose, you can use the following sample Compose file.
This sample references the deepgram/actix-ws-echo:beta
Docker image, which is a WebSockets echo server. You can use it to see streaming Deepgram ASR responses, or you can configure a callback URL to send Deepgram ASR responses elsewhere.
Forward Data to Proxy
Finally, you will need to forward data to the Twilio-Deepgram proxy. You can do this by configuring Twilio to send WebSockets data to the server running the Twilio-Deepgram proxy or by initiating a call between two people and directly forwarding the data to the Twilio-Deepgram proxy.
Configure Twilio to Use WebSockets
To user the Docker Image, you must configure Twilio to forward data to the server serving the Rust program. To do this, see the Start Streaming Audio section of Twilio’s tutorial: Consume a real-time Media Stream using WebSockets, Python, and Flask. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.
When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the Twilio-Deepgram proxy app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.
Sample TwiML Bin files are as follows:
Mono
Stereo
For stereo, an additional track
parameter exists.
Send Call Data to Twilio-Deepgram Proxy
Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream.py
:
- Be sure to replace
TWILIO_ACCOUNT_SID
andTWILIO_AUTH_TOKEN
with your Twilio account information. - Replace the
url
variable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number. - Replace the
to
andfrom_
variables with person A’s phone number, and your Twilio voice number, respectively.
What’s Next