Twilio and Deepgram STT

A starter server and a self-hosted solution for integrating speech-to-text with Twilio and Deepgram.

Integrate Twilio with Deepgram for real-time speech recognition in Programmable Voice calls.

Before You Begin

You’ll need:

Solutions

Choose from two integration options:

  • Starter Server - Python or Node scripts that proxy audio between Twilio and Deepgram. View on GitHub.

  • Docker Image - Production-ready deepgram/twilio-proxy:beta for self-hosted deployments. Contact your Account Executive for access.

Starter Server

The code for the starter server can be accessed in this GitHub repo.

For our starter server, we offer two scripts that can work as proxy servers to help Twilio and Deepgram share data.

  • twilio-proxy-mono: Runs the proxy server for the inbound Twilio track, which represents the audio Twilio receives from the call. To learn more about Twilio tracks, see Twilio’s track documentation.

  • twilio-proxy-stereo: Runs the proxy server for both the inbound and outbound Twilio tracks, which represent the audio Twilio received from the call and the audio generated by Twilio to the call. To learn more about Twilio tracks, see Twilio’s track documentation.

Setup

Dependencies:

  • Python: Python 3.6+ with websockets and pydub
  • Node: Node.js with ws

Configure the Scripts

Prior to running the scripts, you must replace the authentication with your Deepgram username and password.

1'Authorization': 'Token YOUR_DEEPGRAM_API_KEY'

Run the Scripts

To run the scripts:

  1. Clone the GitHub repo to your local machine.

  2. From the command line, navigate to the cloned repository.

  3. Run the script using one of the following commands:

Mono

$python3 twilio-proxy-mono.py

Stereo

1python3 twilio-proxy-stereo.py

Forward Data to Proxy

Finally, you will need to forward data to the proxy scripts. You can do this by configuring Twilio to send WebSockets data to the server running the proxy scripts or by initiating a call between two people and directly forwarding the data to the proxy scripts.

Configure Twilio to Use WebSockets

See the Start Streaming Audio section of Twilio’s tutorial: “Consume a real-time Media Stream using WebSockets, Python, and Flask”. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.

When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the twilio-proxy-mono or twilio-proxy-stereo app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.

Sample TwiML Bin files are as follows:

Mono

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3 <Start>
4 <Stream url="wss://my-server-address" />
5 </Start>
6 <Dial>my-phone-number</Dial>
7</Response>

Stereo

For stereo, an additional track parameter exists.

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3 <Start>
4 <Stream url="wss://my-server-address" track="both_tracks" />
5 </Start>
6 <Dial>my-phone-number</Dial>
7</Response>
Send Call Data to Proxy Scripts

Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream:

1# twilio helper library
2from twilio.rest import Client
3
4# other imports
5import time
6import requests
7import json
8import os
9import uuid
10
11# your account sid and auth token from twilio.com/console
12account_sid = os.environ['TWILIO_ACCOUNT_SID']
13auth_token = os.environ['TWILIO_AUTH_TOKEN']
14# the twilio client
15client = Client(account_sid, auth_token)
16# make the outgoing call
17call = client.calls.create(
18 twiml = '<Response><Start><Stream url="wss://url.to.deepgram.twilio.proxy" track="both_tracks" /></Start><Dial>+11231231234</Dial></Response>', # replace number with person B, replace url
19 to = '+11231231234', # person A
20 from_ = '+11231231234' # your twilio number
21)
  • Be sure to replace TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN with your Twilio account information.
  • Replace the url variable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number.
  • Replace the to and from_ variables with person A’s phone number, and your Twilio voice number, respectively.

Docker Image

We provide the Docker image deepgram/twilio-proxy:beta, which you can request from your Account Executive.

This solution is for Deepgram’s self-hosted customers. Please contact us if you would like to learn more about our self-hosted solutions.

Run the Docker Image

Use Docker Compose to run the Twilio-proxy server. Choose the configuration that matches your deployment model.

Self-Hosted

This sample includes:

yaml
1services:
2 api:
3 ...
4 engine:
5 ...
6
7 proxy:
8 image: deepgram/twilio-proxy:beta
9 ports:
10 - '8080:8080'
11 environment:
12 - RUST_LOG=TRACE
13 - PROXY_URL=0.0.0.0:8080
14 - STEM_URL=ws://api:8080/v2/listen
15 - CALLBACK_URL=ws://echo:8080/
16 command: ''
17
18 echo:
19 image: deepgram/actix-ws-echo:beta
20 environment:
21 - ECHO_URL=0.0.0.0:8080
22 command: ''

Environment Variables:

Environment VariableDescription
RUST_LOG (optional)Sets the logging verbosity. Can be TRACE, DEBUG, INFO, WARN, or ERROR.
PROXY_URLSets the URL of the twilio-proxy server. Should follow the format 0.0.0.0:8080.
STEM_URLSets the URL of the Deepgram endpoint.
CALLBACK_URL (optional)URL to which Deepgram ASR results should be sent. If not specified, Deepgram ASR results are logged by the twilio-proxy server.
Hosted

This sample uses deepgram/actix-ws-echo:beta as a WebSockets echo server for viewing responses. Replace with your own callback URL as needed.

yaml
1version: '2.4'
2
3services:
4 proxy:
5 image: deepgram/twilio-proxy:beta
6 ports:
7 - '8080:8080'
8 environment:
9 - RUST_LOG=TRACE
10 - PROXY_URL=0.0.0.0:8080
11 - STEM_URL=wss://api.deepgram.com/v1/listen
12 - STEM_BAUTH=YOUR_DEEPGRAM_API_KEY
13 - CALLBACK_URL=ws://echo:8080/
14 - CALLBACK_BAUTH=base64-encoded-callback-username:password
15 command: ''
16
17 echo:
18 image: deepgram/actix-ws-echo:beta
19 environment:
20 - ECHO_URL=0.0.0.0:8080
21 command: ''
Environment VariableDescription
RUST_LOG (optional)Sets the logging verbosity. Can be TRACE, DEBUG, INFO, WARN, or ERROR.
PROXY_URLSets the URL of the twilio-proxy server. Should follow the format 0.0.0.0:8080.
STEM_URLSets the URL of the Deepgram endpoint.
STEM_BAUTH (optional)Your Deepgram project’s API Key. This is the value stored in key.
CALLBACK_URL (optional)URL to which Deepgram ASR results should be sent. If not specified, Deepgram ASR results are logged by the twilio-proxy server.
CALLBACK_BAUTH (optional)If using a callback server to receive Deepgram ASR results, the base64-encoded value of username:password for that server.

Forward Data to Proxy

Configure Twilio to send audio to your proxy server using one of these methods:

Option 1: TwiML Bins

Configure Twilio to forward data to the proxy server. To do this, see the Start Streaming Audio section of Twilio’s tutorial: Consume a real-time Media Stream using WebSockets, Python, and Flask. In this tutorial, you will use TwiML Bins, a serverless solution that helps you provide Twilio-hosted instructions to your Twilio applications, to begin streaming your call’s audio.

When calling your Twilio number, the call will be forwarded to the number you set in your TwiML Bin. The conversation will then be forked to the Twilio-Deepgram proxy app, which will send the audio to Deepgram, receive transcriptions, and print the transcriptions to the screen. In a real implementation, you will likely want to provide a callback to which transcriptions can be sent.

Sample TwiML Bin files are as follows:

Mono

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3 <Start>
4 <Stream url="wss://my-server-address" />
5 </Start>
6 <Dial>my-phone-number</Dial>
7</Response>

Stereo

For stereo, an additional track parameter exists.

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3 <Start>
4 <Stream url="wss://my-server-address" track="both_tracks" />
5 </Start>
6 <Dial>my-phone-number</Dial>
7</Response>
Send Call Data to Twilio-Deepgram Proxy

Alternatively, you can initiate a call between two people and forward the call data to the Twilio-Deepgram proxy (as seen in the Twilio + Deepgram demo) using the script in twilio/twilio-api-scripts/stream.py:

Python
1# twilio helper library
2from twilio.rest import Client
3
4# other imports
5import time
6import requests
7import json
8import os
9import uuid
10
11# your account sid and auth token from twilio.com/console
12account_sid = os.environ['TWILIO_ACCOUNT_SID']
13auth_token = os.environ['TWILIO_AUTH_TOKEN']
14# the twilio client
15client = Client(account_sid, auth_token)
16# make the outgoing call
17call = client.calls.create(
18 twiml = '<Response><Start><Stream url="wss://url.to.deepgram.twilio.proxy" track="both_tracks" /></Start><Dial>+11231231234</Dial></Response>', # replace number with person B, replace url
19 to = '+11231231234', # person A
20 from_ = '+11231231234' # your twilio number
21)
  • Be sure to replace TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN with your Twilio account information.
  • Replace the url variable with the URL to the Deepgram-Twilio proxy server, and the Dial number with person B’s phone number.
  • Replace the to and from_ variables with person A’s phone number, and your Twilio voice number, respectively.

What’s Next