Text to Speech Streaming
An overview of the Deepgram .NET SDK and Deepgram streaming text-to-speech.
Installing the SDK
To begin using Deepgram’s Text-to-Speech functionality, you need to install the Deepgram .NET SDK in your existing project. You can do this using the following command:
C#
Make a Deepgram Text-to-Speech Request
C#
Audio Output Streaming
The audio bytes representing the converted text will stream or be passed to the client via the above AudioData
event using the callback function.
It should be noted that these audio bytes are:
- Container-less audio. Meaning depending on the
encoding
value chosen, only the raw audio data is sent. As an example, if you chooselinear16
as yourencoding
for audio, aWAV
header will not be sent. Please see the Tips and Tricks for more information. - Not of standard size/length when received by the client. This is because the text is broken down into sounds representing the speech. Certain sounds chained together to form fragments of spoken words are different in length and content.
Depending on what the use case is for the generated audio bytes, please visit one of these guides to better help utilize these audio bytes for your use case:
- Sending LLM Outputs to a WebSocket
- Text Chunking for Streaming TTS Optimization
- Handling Audio Issues in Text To Speech
Where To Find Additional Examples
The SDK repository contains a good collection of text-to-speech examples, and the README contains links to them.
Some Example(s):
- Example “Hello World” - examples/text-to-speech/websocket/simple