Text Chunking for TTS | Deepgram's Docs

Why Text Chunking Matters

Text chunking significantly reduces perceived latency in TTS applications by allowing audio playback to begin sooner. This is especially important for conversational AI and voice agents where responsiveness is critical.

Instead of waiting for the entire audio to be generated, chunking lets you:

Begin audio playback much faster
Create more responsive voice experiences
Maintain natural-sounding speech

Basic Sentence Chunking

The simplest and most effective approach is to split text at sentence boundaries. This preserves natural speech patterns while enabling faster time-to-first-byte:

1 import re
2 
3 def chunk_by_sentence(text):
4     # Split text at sentence boundaries (periods, question marks, exclamation points)
5     # while preserving the punctuation
6     sentences = re.split(r'(?<=[.!?])\s+', text)
7     
8     # Remove any empty chunks
9     return [sentence for sentence in sentences if sentence]
10 
11 # Example usage
12 text = "Hello, welcome to Deepgram. This is an example of text chunking. How does it sound?"
13 chunks = chunk_by_sentence(text)
14 
15 for i, chunk in enumerate(chunks):
16     print(f"Chunk {i+1}: {chunk}")
17 
18 # Output:
19 # Chunk 1: Hello, welcome to Deepgram.
20 # Chunk 2: This is an example of text chunking.
21 # Chunk 3: How does it sound?

Processing Streaming Text with WebSockets

When working with streaming text (like from an LLM), you need to collect tokens until you have complete sentences. Here’s a simplified approach to process text chunks that arrive as paragraphs:

1 import re
2 import asyncio
3 from deepgram import DeepgramClient, SpeakOptions
4 
5 def chunk_by_sentence(text):
6     # Split text at sentence boundaries (periods, question marks, exclamation points)
7     # while preserving the punctuation
8     sentences = re.split(r'(?<=[.!?])\s+', text)
9     
10     # Remove any empty chunks
11     return [sentence for sentence in sentences if sentence]
12 
13 class SimpleTextChunker:
14     def __init__(self, tts_client):
15         self.queue = []  # Queue to store incoming paragraph chunks
16         self.processed_sentences = set()
17         self.tts_client = tts_client
18 
19     async def process_text_stream(self, paragraph):
20         """Process an array of paragraph chunks, each containing 1-2 sentences"""
21         
22         # Queue paragraph as it arrives (simulating fast reception)
23         self.queue.append(paragraph)
24         print(f"Received and queued paragraph: {paragraph}")
25 
26         # You could preprocess paragraphs here and split them by more than just sentence boundaries
27         
28         # Process the queue
29         while self.queue:
30             # Get the next paragraph from the queue
31             paragraph = self.queue.pop(0)
32             
33             # Split paragraph into sentences using our chunk_by_sentence function
34             sentences = chunk_by_sentence(paragraph)
35             
36             # Process each sentence
37             for sentence in sentences:
38                 if sentence and sentence not in self.processed_sentences:
39                     # Send the sentence to TTS
40                     print(f"Sending sentence to TTS: {sentence}")
41                     audio_response = await self.tts_client.sync_speak({
42                         "text": sentence,
43                         "model": "aura-2-thalia-en",
44                         "sample_rate": 24000
45                     })
46                     # In a real app, you would play this audio immediately
47                     self.processed_sentences.add(sentence)
48 
49 # Example usage with an array of paragraph chunks
50 async def main():
51     # This simulates text coming in as paragraph chunks from an LLM
52     paragraph_chunks = [
53         "Deepgram's TTS API offers low latency. It works great for voice agents.",
54         "This approach simulates receiving chunks as paragraphs. Each paragraph may contain one or two sentences.",
55         "Try it today! You'll be impressed with the results."
56     ]
57 
58     # Set up TTS client
59     deepgram = DeepgramClient()
60     tts_client = deepgram.speak
61     
62     # Set up listeners for TTS events to handle audio data and connection status
63     
64     chunker = SimpleTextChunker(tts_client)
65     # Process each paragraph sequentially
66     for paragraph in paragraph_chunks:
67         await chunker.process_text_stream(paragraph)
68 
69 # Run the example
70 if __name__ == "__main__":
71     asyncio.run(main())

For complete details on implementing the TTS WebSocket connection, see our guide on Real-Time TTS with WebSockets.

Processing Chunked Text

After creating chunks, you have two main options for processing them:

Sequential Processing

Process each chunk in sequence, prioritizing the first chunk:

1 async def process_chunks_sequential(chunks, tts_function):
2     results = []
3     for i, chunk in enumerate(chunks):
4         # You might prioritize the first chunk for faster response
5         result = await tts_function(chunk)
6         results.append(result)
7     return results

Setting Chunk Size

For most applications, sentences work well as chunks. If you need finer control:

Voice assistants: Aim for 50-100 character chunks
Call center bots: Use complete sentences (most natural)
Long-form content: Larger chunks (200-400 characters) preserve intonation

Other Chunking Strategies

If you need more advanced chunking methods, search for these techniques:

Clause-based chunking: Splits long sentences at commas and semicolons
NLP-based chunking: Uses natural language processing to find semantic boundaries
Adaptive chunking: Adjusts chunk size based on content complexity
First-chunk optimization: Specially optimizes the first chunk for minimal latency
SSML chunking: Handles Speech Synthesis Markup Language tags when chunking

For WebSocket implementation details to stream the chunked audio, see our guide on Real-Time TTS with WebSockets.