Twilio and Deepgram TTS

Learn how to use Twilio with Deepgram Aura TTS.

Streaming audio from Deepgram Aura Text-to-Speech (TTS) into an ongoing Twilio phone call requires the use of the Twilio streaming API.

Prerequisites

For the complete code used in this guide, please check out this repository.

You will need:

  • A free Twilio account with a Twilio phone number.
  • A Deepgram API Key. If you don't have one, you can get an API Key here .
  • ngrok to let Twilio access a local server OR your own hosted server.
  • Understanding of Python and using Python virtual environments.

TwiML Bin Setup

First, you will need to set up a TwiML Bin. You can refer to the docs on how to do that in the Twilio Console.

📘

Deepgram Aura TTS is not available via the Twilio<Say>verb. Instead you will use a URL.

<?xml version="1.0" encoding="UTF-8"?>

<Response>
    <Say language="en">"This call may be monitored or recorded."</Say>
    <Connect>
        <Stream url="wss://a127-75-172-116-97.ngrok-free.app/twilio" />
    </Connect>
</Response>

You should replace the url with wherever you decide to deploy the server we are about to create and ensure/twilio is at the end of the url.

In the TwiML Bin example above, ngrok is used to expose the server running locally.

Using ngrok

ngrok is recommended for quick development and testing but shouldn't be used for production instances.

To use ngrok see their documentation.

Be sure to set the port correctly to align with the server code provided by running this command when you start the ngrok server.

ngrok http 5000

📘

If you restart your ngrok server, your URL will change, which will require you to update your TwiML Bin

Connecting a Twilio phone number

Your TwiML Bin must then be connected to one of your Twilio phone numbers so that it gets executed whenever someone calls that number. If you need to set up a phone number and connect it to your TwiML Bin, refer to the Twilio Docs.

📘

In your TwiML Bin The <Connect> verb is required for bi-directional communication, i.e. in order to send audio from Aura TSS to Twilio, you must use this verb.

Building the Server

Copy the twilio.pycode from the repository as we will use this in the steps below and save this code locally as with a file name of twilio.py.

At this point you'll want to start up a virtual environment for Python. Please refer to documentation for how to do that based on your personal Python preferences.

Depending on your situation you may also need to install specific packages used in this code.

pip install package_name

If your TwiML Bin is setup correctly, you can now navigate to this files location in your terminal and run the server with the following command:

python twilio.py

OR

python3 twilio.py

You can then start making calls to the phone number your TwiML Bin is using. Without any further modifications, you should hear Deepgram Aura say simply: "Hello, how are you today?"

Code Tour

Let's dive into the code used in the twilio.py file.

First, we have some import statements:

import asyncio
import base64
import json
import sys
import websockets
import ssl
import requests
  • We are using asyncio and websockets to build an asynchronous websocket server.
  • We will use base64 to handle encoding audio from Aura to pass data to Twilio.
  • We will use json to deal with parsing text messages from Twilio .
  • We will use requests to make HTTP requests to Deepgram's Aura/TTS endpoint.

Next we have:

async def twilio_handler(twilio_ws):
    streamsid_queue = asyncio.Queue()
  • We will be spinning up asynchronous tasks for receiving messages from, and sending messages to, Twilio.
  • We will use this streamsid_queue to pass the stream sid from the twilio_receiver task to the twilio_sender task.
  • We need to specify this stream sid to ensure that audio from Deepgram Aura is routed correctly to the corresponding phone call.

The twilio_receiver task is defined next:

    async def twilio_receiver(twilio_ws):
        async for message in twilio_ws:
            try:
                data = json.loads(message)

                if data['event'] == 'start':
                    streamsid_queue.put_nowait(data['start']['streamSid'])
            except:
                break
  • This task simply loops over incoming websocket messages from Twilio and extracts the stream sid when it gets it.

Next we have the twilio_sender task:

    async def twilio_sender(twilio_ws):
        print('twilio_sender started')

        # wait to receive the streamsid for this connection from one of Twilio's messages
        streamsid = await streamsid_queue.get()
  • We first wait to receive the stream sid.
        # make a Deepgram Aura TTS request specifying that we want raw mulaw audio as the output
        url = 'https://api.deepgram.com/v1/speak?model=aura-asteria-en&encoding=mulaw&sample_rate=8000&container=none'
        headers = {
            'Authorization': 'Token YOUR_DEEPGRAM_API_KEY',
            'Content-Type': 'application/json'
        }
        payload = {
            'text': 'Hello, how are you today?'
        }
        tts_response = requests.post(url, headers=headers, json=payload)
  • Then we make a request to Deepgram Aura TTS to say "Hello, how are you today?" specifying an audio format of raw, 8000 Hz, mulaw.

🚧

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Next we have:

        if tts_response.status_code == 200:
            raw_mulaw = tts_response.content

            # construct a Twilio media message with the raw mulaw (see https://www.twilio.com/docs/voice/twiml/stream#websocket-messages---to-twilio)
            media_message = {
                'event': 'media',
                'streamSid': streamsid,
                'media': {
                    'payload': base64.b64encode(raw_mulaw).decode('ascii')
                }
            }

            # send the TTS audio to the attached phonecall
            await twilio_ws.send(json.dumps(media_message))
  • Here we package up the Deepgram Aura TTS audio in the format Twilio expects.
  • We specify the stream sid.
  • We send that audio back to Twilio via the websocket connection.
  • To better understand what is occurring at this step please refer to the Twilio docs for more details.

Additionally, if your application requires the bot to stop speaking at any point, you can do that simply by sending a "clear" message to Twilio.

await twilio_ws.send(json.dumps({"event": "clear", "streamSid": streamsid}))

To close out our websocket handler, we run these two asynchronous tasks with asyncio:

    await asyncio.wait([
        asyncio.ensure_future(twilio_receiver(twilio_ws)),
        asyncio.ensure_future(twilio_sender(twilio_ws))
    ])

    await twilio_ws.close()

Finally, for some scaffolding to spin up the server and pointing requests to get handled by the above function, we have:

async def router(websocket, path):
    if path == '/twilio':
        print('twilio connection incoming')
        await twilio_handler(websocket)

def main():
    # use this if using ssl
#	ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
#	ssl_context.load_cert_chain('cert.pem', 'key.pem')
#	server = websockets.serve(router, '0.0.0.0', 443, ssl=ssl_context)

    # use this if not using ssl
    server = websockets.serve(router, 'localhost', 5000)

    asyncio.get_event_loop().run_until_complete(server)
    asyncio.get_event_loop().run_forever()

if __name__ == '__main__':
    sys.exit(main() or 0)

📘

To learn more about sending Twilio phone call audio to Deepgram for Speech-to-Text (STT) see the following guide.