Text to Speech Stream

When you are sending text to our TTS websocket from an LLM, you will need to send a Flush message whenever the LLM finishes a response to indicate the end of the conversation. This allows Deepgram to generate the audio from its existing text buffer without waiting for additional text.

In other cases in real-time text-to-speech processing, there are scenarios where you may need to force the server to process all (or flush) unprocessed speech-to-text data immediately. Deepgram supports a Flush message to handle such situations.

Very frequent flushes can affect audio output quality.

Sending `Flush`

To send the Flush message, you need to send the following JSON message to the server:

JSON

1 {
2     "type": "Flush"
3 }

`Flush` Confirmation

Upon receiving the Flush message, the server will process all remaining text data and return the final audio results.

Example Response

JSON

1 {
2  
3 "type": "Flushed",
4 "sequence_id": 0
5 }

Limits

The maximum number of times you can send the Flush message is 20 times every 60 seconds. After that, you will receive a warning message stating that we cannot process any more flush messages until the 60-second time window has passed.

Conclusion

Using the Flush message with Deepgram’s API allows for precise control over the finalization of speech-to-text processing. This feature is essential for scenarios requiring immediate processing of the remaining data, ensuring accurate and timely results.

Sending Flush

Flush Confirmation