Text Chunking for TTS REST Optimization

Text chunking is the process of breaking down text inputs into smaller, manageable chunks before processing.

When dealing with lengthy text inputs, latency can become an issue as the processing time increases with the length of the text. To address this challenge, one effective strategy is text chunking. Text chunking is the process of breaking down text inputs into smaller, manageable chunks before processing.

Code Examples

The following three code examples build upon each previous example in complexity to present strategies for using text chunking to optimize your Text-to-Speech applications.

These examples rely on Deepgram’s Python SDK. Learn how to get started with Aura and Deepgram’s SDKs by reading the Aura Text-to-Speech guide.

Chunk By Maximum Number of Characters

This example breaks down lengthy text inputs into chunks determined by a maximum number of characters.

This is a straightforward example which does not take into consideration characteristics of the text structure, such as clause and sentence boundaries. For some types of text, this is acceptable and will not have a negative effect on the quality of speech.

SDK - Python
1from deepgram import (
2 DeepgramClient,
3 SpeakOptions,
4)
5
6# Define the maximum chunk size (in characters) for text chunking
7MAX_CHUNK_SIZE = 200
8
9input_text = "Our story begins in a peaceful woodland kingdom where a lively squirrel named Frolic made his abode high up within a cedar tree's embrace. He was not a usual woodland creature, for he was blessed with an insatiable curiosity and a heart for adventure. Nearby, a glistening river snaked through the landscape, home to a wonder named Splash - a silver-scaled flying fish whose ability to break free from his water-haven intrigued the woodland onlookers. This magical world moved on a rhythm of its own until an unforeseen circumstance brought Frolic and Splash together. One radiant morning, while Frolic was on his regular excursion, and Splash was making his aerial tours, an unpredictable wave playfully tossed and misplaced Splash onto the riverbank. Despite his initial astonishment, Frolic hurriedly and kindly assisted his new friend back to his watery abode. Touched by Frolic's compassion, Splash expressed his gratitude by inviting his friend to share his world. As Splash perched on Frolic's back, he tasted of the forest's bounty, felt the sun’s rays filter through the colors of the trees, experienced the conversations amidst the woods, and while at it, taught the woodland how to blur the lines between earth and water."
10
11def chunk_text(text, chunk_size):
12 chunks = []
13 words = text.split()
14 current_chunk = ''
15 for word in words:
16 if len(current_chunk) + len(word) <= chunk_size:
17 current_chunk += ' ' + word
18 else:
19 chunks.append(current_chunk.strip())
20 current_chunk = word
21 if current_chunk:
22 chunks.append(current_chunk.strip())
23 return chunks
24
25def main():
26 try:
27 # Create a Deepgram client using the API key
28 deepgram = DeepgramClient(api_key="DEEPGRAM_API_KEY")
29
30 # Choose a model to use for synthesis
31 options = SpeakOptions(
32 model="aura-helios-en",
33 )
34
35 # Chunk the text into smaller parts
36 text_chunks = chunk_text(input_text, MAX_CHUNK_SIZE)
37
38 # Synthesize audio for each chunk
39 for i, chunk in enumerate(text_chunks):
40 print(f"\nProcessing chunk {i + 1}...{chunk}\n")
41 filename = f"chunk_{i + 1}.mp3"
42
43 SPEAK_OPTIONS = {"text": chunk}
44
45 response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
46 print(response.to_json(indent=4))
47
48 except Exception as e:
49 print(f"Exception: {e}")
50
51if __name__ == "__main__":
52 main()

Chunk By Clauses and Sentence Boundaries

In this example, the aim is to preserve naturalness of speech by chunking the text based on clause and sentence boundaries. When people speak, they tend to pause at the end of a clause or a sentence, so this strategy is helpful when working with texts that contain complex sentences in a narrative style.

The regular expression in the code tells the program to break the text into chunks based on the following:

  • The punctuation marks of period ., question mark ?, explanation mark !, or semicolon ;
  • A comma , + single whitespace + coordinating conjunctions and, but, or, nor, for, yet, so

These are two grammatical rules for identifying clauses, but you may decide to include more.

SDK - Python
1import re
2from deepgram import (
3 DeepgramClient,
4 SpeakOptions,
5)
6
7input_text = "Our story begins in a peaceful woodland kingdom where a lively squirrel named Frolic made his abode high up within a cedar tree's embrace. He was not a usual woodland creature, for he was blessed with an insatiable curiosity and a heart for adventure. Nearby, a glistening river snaked through the landscape, home to a wonder named Splash - a silver-scaled flying fish whose ability to break free from his water-haven intrigued the woodland onlookers. This magical world moved on a rhythm of its own until an unforeseen circumstance brought Frolic and Splash together. One radiant morning, while Frolic was on his regular excursion, and Splash was making his aerial tours, an unpredictable wave playfully tossed and misplaced Splash onto the riverbank. Despite his initial astonishment, Frolic hurriedly and kindly assisted his new friend back to his watery abode. Touched by Frolic's compassion, Splash expressed his gratitude by inviting his friend to share his world. As Splash perched on Frolic's back, he tasted of the forest's bounty, felt the sun’s rays filter through the colors of the trees, experienced the conversations amidst the woods, and while at it, taught the woodland how to blur the lines between earth and water."
8
9CLAUSE_BOUNDARIES = r'\.|\?|!|;|, (and|but|or|nor|for|yet|so)'
10
11def chunk_text_by_clause(text):
12 # Find clause boundaries using regular expression
13 clause_boundaries = re.finditer(CLAUSE_BOUNDARIES, text)
14 boundaries_indices = [boundary.start() for boundary in clause_boundaries]
15
16 chunks = []
17 start = 0
18 for boundary_index in boundaries_indices:
19 chunks.append(text[start:boundary_index + 1].strip())
20 start = boundary_index + 1
21 # Append the remaining part of the text
22 chunks.append(text[start:].strip())
23
24 return chunks
25
26def main():
27 try:
28 # Create a Deepgram client using the API key
29 deepgram = DeepgramClient(api_key="DEEPGRAM_API_KEY")
30
31 # Choose a model to use for synthesis
32 options = SpeakOptions(
33 model="aura-helios-en",
34 )
35
36 # Chunk the text into smaller parts
37 text_chunks = chunk_text_by_clause(input_text)
38
39 # Synthesize audio for each chunk
40 for i, chunk in enumerate(text_chunks):
41 print(f"\nProcessing chunk {i + 1}...{chunk}\n")
42 filename = f"chunk_{i + 1}.mp3"
43
44 SPEAK_OPTIONS = {"text": chunk}
45
46 response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
47 print(response.to_json(indent=4))
48
49 except Exception as e:
50 print(f"Exception: {e}")
51
52if __name__ == "__main__":
53 main()

Dynamic Chunking

The goal of dynamic chunking is to adjust the chunk sizes dynamically based on various factors, which may include adaptive rules or algorithms to determine how to split the text into chunks.

This next example implements a more flexible chunking strategy that adjusts chunk sizes dynamically based on the length and structure of the input text. It retains the rule from the previous example - to chunk based on clause/sentence boundaries - but it then looks at each chunk and determines the character count of the chunk. If the count exceeds a maximum character length, it chunks further into subchunks, where subchunks are defined by a comma but cannot be less than three characters.

SDK - Python
1import re
2from deepgram import (
3 DeepgramClient,
4 SpeakOptions,
5)
6
7input_text = "Our story begins in a peaceful woodland kingdom where a lively squirrel named Frolic made his abode high up within a cedar tree's embrace. He was not a usual woodland creature, for he was blessed with an insatiable curiosity and a heart for adventure. Nearby, a glistening river snaked through the landscape, home to a wonder named Splash - a silver-scaled flying fish whose ability to break free from his water-haven intrigued the woodland onlookers. This magical world moved on a rhythm of its own until an unforeseen circumstance brought Frolic and Splash together. One radiant morning, while Frolic was on his regular excursion, and Splash was making his aerial tours, an unpredictable wave playfully tossed and misplaced Splash onto the riverbank. Despite his initial astonishment, Frolic hurriedly and kindly assisted his new friend back to his watery abode. Touched by Frolic's compassion, Splash expressed his gratitude by inviting his friend to share his world. As Splash perched on Frolic's back, he tasted of the forest's bounty, felt the sun’s rays filter through the colors of the trees, experienced the conversations amidst the woods, and while at it, taught the woodland how to blur the lines between earth and water."
8
9CLAUSE_BOUNDARIES = r'\.|\?|!|;|, (and|but|or|nor|for|yet|so)'
10MAX_CHUNK_LENGTH = 100
11
12def chunk_text_dynamically(text):
13 # Find clause boundaries using regular expression
14 clause_boundaries = re.finditer(CLAUSE_BOUNDARIES, text)
15 boundaries_indices = [boundary.start() for boundary in clause_boundaries]
16
17 chunks = []
18 start = 0
19 # Add chunks until the last clause boundary
20 for boundary_index in boundaries_indices:
21 chunk = text[start:boundary_index + 1].strip()
22 if len(chunk) <= MAX_CHUNK_LENGTH:
23 chunks.append(chunk)
24 else:
25 # Split by comma if it doesn't create subchunks less than three words
26 subchunks = chunk.split(',')
27 temp_chunk = ''
28 for subchunk in subchunks:
29 if len(temp_chunk) + len(subchunk) <= MAX_CHUNK_LENGTH:
30 temp_chunk += subchunk + ','
31 else:
32 if len(temp_chunk.split()) >= 3:
33 chunks.append(temp_chunk.strip())
34 temp_chunk = subchunk + ','
35 if temp_chunk:
36 if len(temp_chunk.split()) >= 3:
37 chunks.append(temp_chunk.strip())
38 start = boundary_index + 1
39
40 # Split remaining text into subchunks if needed
41 remaining_text = text[start:].strip()
42 if remaining_text:
43 remaining_subchunks = [remaining_text[i:i+MAX_CHUNK_LENGTH] for i in range(0, len(remaining_text), MAX_CHUNK_LENGTH)]
44 chunks.extend(remaining_subchunks)
45
46 return chunks
47
48def main():
49 try:
50 # Create a Deepgram client using the API key
51 deepgram = DeepgramClient(api_key="DEEPGRAM_API_KEY")
52
53 # Choose a model to use for synthesis
54 options = SpeakOptions(
55 model="aura-helios-en",
56 )
57
58 # Chunk the text into smaller parts
59 text_chunks = chunk_text_dynamically(input_text)
60
61 # Synthesize audio for each chunk
62 for i, chunk in enumerate(text_chunks):
63 print(f"\nProcessing chunk {i + 1}...{chunk}\n")
64 filename = f"chunk_{i + 1}.mp3"
65
66 SPEAK_OPTIONS = {"text": chunk}
67
68 response = deepgram.speak.v("1").save(filename, SPEAK_OPTIONS, options)
69 # print(response.to_json(indent=4))
70
71 except Exception as e:
72 print(f"Exception: {e}")
73
74if __name__ == "__main__":
75 main()

Chunking + Streaming Audio

Instead of turning each chunk into an audio file such as an MP3 file, you might prefer to stream the audio as it comes in. It is possible to start streaming your text-to-speech audio as soon as the first byte arrives.

In this example, the text is chunked by sentence boundaries. Then each chunk is sent to Deepgram to be processed into audio, but when the first byte of audio arrives back to you, it is played immediately in a stream. Each audio stream is played consecutively in the order that the text splits them into chunks.

Python - SDK
1import re
2from deepgram import (
3 DeepgramClient,
4 SpeakOptions,
5)
6from pydub import AudioSegment
7from pydub.playback import play
8
9input_text = "Our story begins in a peaceful woodland kingdom where a lively squirrel named Frolic made his abode high up within a cedar tree's embrace. He was not a usual woodland creature, for he was blessed with an insatiable curiosity and a heart for adventure. Nearby, a glistening river snaked through the landscape, home to a wonder named Splash - a silver-scaled flying fish whose ability to break free from his water-haven intrigued the woodland onlookers. This magical world moved on a rhythm of its own until an unforeseen circumstance brought Frolic and Splash together. One radiant morning, while Frolic was on his regular excursion, and Splash was making his aerial tours, an unpredictable wave playfully tossed and misplaced Splash onto the riverbank. Despite his initial astonishment, Frolic hurriedly and kindly assisted his new friend back to his watery abode. Touched by Frolic's compassion, Splash expressed his gratitude by inviting his friend to share his world. As Splash perched on Frolic's back, he tasted of the forest's bounty, felt the sun’s rays filter through the colors of the trees, experienced the conversations amidst the woods, and while at it, taught the woodland how to blur the lines between earth and water."
10
11def chunk_text_by_sentence(text):
12 # Find sentence boundaries using regular expression
13 sentence_boundaries = re.finditer(r'(?<=[.!?])\s+', text)
14 boundaries_indices = [boundary.start() for boundary in sentence_boundaries]
15
16 chunks = []
17 start = 0
18 # Split the text into chunks based on sentence boundaries
19 for boundary_index in boundaries_indices:
20 chunks.append(text[start:boundary_index + 1].strip())
21 start = boundary_index + 1
22 chunks.append(text[start:].strip())
23
24 return chunks
25
26def synthesize_audio(text):
27 # Create a Deepgram client using the API key
28 deepgram = DeepgramClient(api_key="DEEPGRAM_API_KEY")
29 # Choose a model to use for synthesis
30 options = SpeakOptions(
31 model="aura-helios-en",
32 )
33 speak_options = {"text": text}
34 # Synthesize audio and stream the response
35 response = deepgram.speak.v("1").stream(speak_options, options)
36 # Get the audio stream from the response
37 audio_buffer = response.stream
38 audio_buffer.seek(0)
39 # Load audio from buffer using pydub
40 audio = AudioSegment.from_mp3(audio_buffer)
41
42 return audio
43
44def main():
45 # Chunk the text into smaller parts
46 chunks = chunk_text_by_sentence(input_text)
47
48 # Synthesize each chunk into audio and play the audio
49 for chunk_text in chunks:
50 audio = synthesize_audio(chunk_text)
51 play(audio)
52
53if __name__ == "__main__":
54 main()

Read more about about streaming the TTS audio output in the guide Streaming Audio Outputs.

Considerations

When using text chunking as a strategy to minimize latency, some factors to keep in mind are the following:

  • preserving naturalness of speech - Maintain proper pronunciation, intonation, and rhythm to enhance the user experience.
  • contextual understanding - Analyze the structure and meaning of the text to identify natural breakpoints, such as sentence or clause boundaries, for dividing the text.
  • dynamic chunking - Implement a flexible chunking strategy that adjusts chunk sizes dynamically based on the length and structure of the input text.
  • user expectations - Consider the preferences and needs of users, such as their tolerance for latency, the desired quality of synthesized speech, and their overall satisfaction with the application’s performance.

Built with