Twilio and Deepgram TTS
Learn how to use Twilio with Deepgram Aura TTS.
Streaming audio from Deepgram Aura Text-to-Speech (TTS) into an ongoing Twilio phone call requires the use of the Twilio streaming API.
Before you Begin
Before you can use Deepgram, you'll need to create a Deepgram account. Signup is free and includes $200 in free credit and access to all of Deepgram's features!
Before you start, you'll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
Prerequisites
For the complete code used in this guide, please check out this repository.
You will need:
- A free Twilio account with a Twilio phone number.
- ngrok to let Twilio access a local server OR your own hosted server.
- Understanding of Python and using Python virtual environments.
TwiML Bin Setup
First, you will need to set up a TwiML Bin
. You can refer to the docs on how to do that in the Twilio Console.
Deepgram Aura TTS is not available via the Twilio
<Say>
verb. Instead you will use a URL.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say language="en">"This call may be monitored or recorded."</Say>
<Connect>
<Stream url="wss://a127-75-172-116-97.ngrok-free.app/twilio" />
</Connect>
</Response>
You should replace the url with wherever you decide to deploy the server we are about to create and ensure/twilio
is at the end of the url.
In the TwiML Bin
example above, ngrok is used to expose the server running locally.
Using ngrok
ngrok is recommended for quick development and testing but shouldn't be used for production instances.
To use ngrok see their documentation.
Be sure to set the port correctly to align with the server code provided by running this command when you start the ngrok server.
ngrok http 5000
If you restart your ngrok server, your URL will change, which will require you to update your
TwiML Bin
Connecting a Twilio phone number
Your TwiML Bin
must then be connected to one of your Twilio phone numbers so that it gets executed whenever someone calls that number. If you need to set up a phone number and connect it to your TwiML Bin
, refer to the Twilio Docs.
In your
TwiML Bin
The<Connect>
verb is required for bi-directional communication, i.e. in order to send audio from Aura TSS to Twilio, you must use this verb.
Building the Server
Copy the twilio.py
code from the repository as we will use this in the steps below and save this code locally as with a file name of twilio.py
.
At this point you'll want to start up a virtual environment for Python. Please refer to documentation for how to do that based on your personal Python preferences.
Depending on your situation you may also need to install specific packages used in this code.
pip install package_name
If your TwiML Bin
is setup correctly, you can now navigate to this files location in your terminal and run the server with the following command:
python twilio.py
OR
python3 twilio.py
You can then start making calls to the phone number your TwiML Bin
is using. Without any further modifications, you should hear Deepgram Aura say simply: "Hello, how are you today?"
Code Tour
Let's dive into the code used in the twilio.py file.
First, we have some import statements:
import asyncio
import base64
import json
import sys
import websockets
import ssl
import requests
- We are using
asyncio
andwebsockets
to build an asynchronous websocket server. - We will use
base64
to handle encoding audio from Aura to pass data to Twilio. - We will use
json
to deal with parsing text messages from Twilio . - We will use
requests
to make HTTP requests to Deepgram's Aura/TTS endpoint.
Next we have:
async def twilio_handler(twilio_ws):
streamsid_queue = asyncio.Queue()
- We will be spinning up asynchronous tasks for receiving messages from, and sending messages to, Twilio.
- We will use this
streamsid_queue
to pass the streamsid
from thetwilio_receiver
task to thetwilio_sender
task. - We need to specify this stream
sid
to ensure that audio from Deepgram Aura is routed correctly to the corresponding phone call.
The twilio_receiver
task is defined next:
async def twilio_receiver(twilio_ws):
async for message in twilio_ws:
try:
data = json.loads(message)
if data['event'] == 'start':
streamsid_queue.put_nowait(data['start']['streamSid'])
except:
break
- This task simply loops over incoming websocket messages from Twilio and extracts the stream sid when it gets it.
Next we have the twilio_sender
task:
async def twilio_sender(twilio_ws):
print('twilio_sender started')
# wait to receive the streamsid for this connection from one of Twilio's messages
streamsid = await streamsid_queue.get()
- We first wait to receive the stream sid.
# make a Deepgram Aura TTS request specifying that we want raw mulaw audio as the output
url = 'https://api.deepgram.com/v1/speak?model=aura-asteria-en&encoding=mulaw&sample_rate=8000&container=none'
headers = {
'Authorization': 'Token YOUR_DEEPGRAM_API_KEY',
'Content-Type': 'application/json'
}
payload = {
'text': 'Hello, how are you today?'
}
tts_response = requests.post(url, headers=headers, json=payload)
- Then we make a request to Deepgram Aura TTS to say "Hello, how are you today?" specifying an audio format of raw, 8000 Hz, mulaw.
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Next we have:
if tts_response.status_code == 200:
raw_mulaw = tts_response.content
# construct a Twilio media message with the raw mulaw (see https://www.twilio.com/docs/voice/twiml/stream#websocket-messages---to-twilio)
media_message = {
'event': 'media',
'streamSid': streamsid,
'media': {
'payload': base64.b64encode(raw_mulaw).decode('ascii')
}
}
# send the TTS audio to the attached phonecall
await twilio_ws.send(json.dumps(media_message))
- Here we package up the Deepgram Aura TTS audio in the format Twilio expects.
- We specify the stream
sid
. - We send that audio back to Twilio via the websocket connection.
- To better understand what is occurring at this step please refer to the Twilio docs for more details.
Additionally, if your application requires the bot to stop speaking at any point, you can do that simply by sending a "clear" message to Twilio.
await twilio_ws.send(json.dumps({"event": "clear", "streamSid": streamsid}))
To close out our websocket handler, we run these two asynchronous tasks with asyncio
:
await asyncio.wait([
asyncio.ensure_future(twilio_receiver(twilio_ws)),
asyncio.ensure_future(twilio_sender(twilio_ws))
])
await twilio_ws.close()
Finally, for some scaffolding to spin up the server and pointing requests to get handled by the above function, we have:
async def router(websocket, path):
if path == '/twilio':
print('twilio connection incoming')
await twilio_handler(websocket)
def main():
# use this if using ssl
# ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
# ssl_context.load_cert_chain('cert.pem', 'key.pem')
# server = websockets.serve(router, '0.0.0.0', 443, ssl=ssl_context)
# use this if not using ssl
server = websockets.serve(router, 'localhost', 5000)
asyncio.get_event_loop().run_until_complete(server)
asyncio.get_event_loop().run_forever()
if __name__ == '__main__':
sys.exit(main() or 0)
To learn more about sending Twilio phone call audio to Deepgram for Speech-to-Text (STT) see the following guide.
Updated about 2 months ago