Text to Speech Streaming — Deepgram

Aura-2 is currently available for the TTS REST API only. Websocket support is coming soon.

The Deepgram JavaScript SDK now works in both server and browser environments. A proxy configuration is required for browser environments (see the section below).

Installing the SDK

TypeScript

1 # Install the Deepgram JS SDK
2 # https://github.com/deepgram/deepgram-js-sdk
3 
4 npm install @deepgram/sdk

Initializing the SDK

TypeScript

1 import { createClient } from "@deepgram/sdk";
2 import fs from "fs";
3 
4 const deepgram = createClient("DEEPGRAM_API_KEY");

Make a Deepgram Text-to-Speech Request

Once the SDK is initialized, you can make a request to convert text into speech.

TypeScript

1 const fs = require("fs");
2 const { createClient, LiveTTSEvents } = require("../../dist/main/index");
3 
4 const live = async () => {
5   const text = "Hello, how can I help you today?";
6 
7   const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
8 
9   const dgConnection = deepgram.speak.live({ model: "aura-asteria-en" });
10 
11   let audioBuffer = Buffer.alloc(0);
12 
13   dgConnection.on(LiveTTSEvents.Open, () => {
14     console.log("Connection opened");
15 
16     // Send text data for TTS synthesis
17     dgConnection.sendText(text);
18 
19     // Send Flush message to the server after sending the text
20     dgConnection.flush();
21 
22     dgConnection.on(LiveTTSEvents.Close, () => {
23       console.log("Connection closed");
24     });
25 
26     dgConnection.on(LiveTTSEvents.Metadata, (data) => {
27       console.dir(data, { depth: null });
28     });
29 
30     dgConnection.on(LiveTTSEvents.Audio, (data) => {
31       console.log("Deepgram audio data received");
32       // Concatenate the audio chunks into a single buffer
33       const buffer = Buffer.from(data);
34       audioBuffer = Buffer.concat([audioBuffer, buffer]);
35     });
36 
37     dgConnection.on(LiveTTSEvents.Flushed, () => {
38       console.log("Deepgram Flushed");
39       // Write the buffered audio data to a file when the flush event is received
40       writeFile();
41     });
42 
43     dgConnection.on(LiveTTSEvents.Error, (err) => {
44       console.error(err);
45     });
46   });
47 
48   const writeFile = () => {
49     if (audioBuffer.length > 0) {
50       fs.writeFile("output.mp3", audioBuffer, (err) => {
51         if (err) {
52           console.error("Error writing audio file:", err);
53         } else {
54           console.log("Audio file saved as output.mp3");
55         }
56       });
57       audioBuffer = Buffer.alloc(0); // Reset buffer after writing
58     }
59   };
60 };
61 
62 live();

Audio Output Streaming

The audio bytes representing the converted text will stream or be passed to the client via the above AudioData event using the callback function.

It should be noted that these audio bytes are:

Container-less audio. Meaning depending on the encoding value chosen, only the raw audio data is sent. As an example, if you choose linear16 as your encoding for audio, a WAV header will not be sent. Please see these Tips and Tricks for more information.
Not of standard size/length when received by the client. This is because the text is broken down into sounds representing the speech. Certain sounds chained together to form fragments of spoken words are different in length and content.

Depending on what the use case is for the generated audio bytes, please visit one of these guides to better help utilize these audio bytes for your use case:

Where to Find Additional Examples

The SDK repository has a good collection of text-to-speech examples. The README contains links to them. Each example below attempts to provide different options for transcribing an audio source.

Some Example(s):

Hello World - examples/node-speak-live

1	# Install the Deepgram JS SDK
2	# https://github.com/deepgram/deepgram-js-sdk
3
4	npm install @deepgram/sdk

1	import { createClient } from "@deepgram/sdk";
2	import fs from "fs";
3
4	const deepgram = createClient("DEEPGRAM_API_KEY");

1	const fs = require("fs");
2	const { createClient, LiveTTSEvents } = require("../../dist/main/index");
3
4	const live = async () => {
5	const text = "Hello, how can I help you today?";
6
7	const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
8
9	const dgConnection = deepgram.speak.live({ model: "aura-asteria-en" });
10
11	let audioBuffer = Buffer.alloc(0);
12
13	dgConnection.on(LiveTTSEvents.Open, () => {
14	console.log("Connection opened");
15
16	// Send text data for TTS synthesis
17	dgConnection.sendText(text);
18
19	// Send Flush message to the server after sending the text
20	dgConnection.flush();
21
22	dgConnection.on(LiveTTSEvents.Close, () => {
23	console.log("Connection closed");
24	});
25
26	dgConnection.on(LiveTTSEvents.Metadata, (data) => {
27	console.dir(data, { depth: null });
28	});
29
30	dgConnection.on(LiveTTSEvents.Audio, (data) => {
31	console.log("Deepgram audio data received");
32	// Concatenate the audio chunks into a single buffer
33	const buffer = Buffer.from(data);
34	audioBuffer = Buffer.concat([audioBuffer, buffer]);
35	});
36
37	dgConnection.on(LiveTTSEvents.Flushed, () => {
38	console.log("Deepgram Flushed");
39	// Write the buffered audio data to a file when the flush event is received
40	writeFile();
41	});
42
43	dgConnection.on(LiveTTSEvents.Error, (err) => {
44	console.error(err);
45	});
46	});
47
48	const writeFile = () => {
49	if (audioBuffer.length > 0) {
50	fs.writeFile("output.mp3", audioBuffer, (err) => {
51	if (err) {
52	console.error("Error writing audio file:", err);
53	} else {
54	console.log("Audio file saved as output.mp3");
55	}
56	});
57	audioBuffer = Buffer.alloc(0); // Reset buffer after writing
58	}
59	};
60	};
61
62	live();