Text-to-Speech
An overview of the Deepgram Go SDK and Deepgram text-to-speech.
Installing the SDK
To begin using Deepgram's Text-to-Speech functionality, you need to install the Deepgram Go SDK in your existing project. You can do this using the following command:
go get github.com/deepgram/deepgram-go-sdk
Make a Deepgram Text-to-Speech Request
package main
import (
"context"
"encoding/json"
"fmt"
"os"
prettyjson "github.com/hokaccha/go-prettyjson"
api "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1/rest"
interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)
const (
textToSpeech string = "Hello, World!"
filePath string = "./test.mp3"
)
func main() {
// init library
client.InitWithDefault()
// Go context
ctx := context.Background()
// set the Transcription options
options := &interfaces.SpeakOptions{
Model: "aura-asteria-en",
}
// create a Deepgram client
c := client.NewRESTWithDefaults()
dg := api.New(c)
// send/process file to Deepgram
res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
if err != nil {
fmt.Printf("FromStream failed. Err: %v\n", err)
os.Exit(1)
}
data, err := json.Marshal(res)
if err != nil {
fmt.Printf("json.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
// make the JSON pretty
prettyJSON, err := prettyjson.Format(data)
if err != nil {
fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
fmt.Printf("\n\nResult:\n%s\n\n", prettyJSON)
}
Audio Output Streaming
Deepgram's TTS API allows you to start playing the audio as soon as the first byte is received. This section provides examples to help you stream the audio output efficiently.
Single Text Source Payload
The following example demonstrates how to stream the audio as soon as the first byte arrives for a single text source:
package main
import (
"context"
"encoding/json"
"fmt"
"os"
prettyjson "github.com/hokaccha/go-prettyjson"
speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)
const (
textToSpeech string = "Hello, World!"
filePath string = "./test.mp3"
)
func main() {
// STEP 1: Initialize the library
client.InitWithDefault()
// Go context
ctx := context.Background()
// STEP 2: Create a Deepgram client.
// By default, the DEEPGRAM_API_KEY environment variable will be used for the API Key
c := client.NewWithDefaults()
dg := speak.New(c)
// STEP 3: Configure the options (such as model choice, audio configuration, etc.)
options := &interfaces.SpeakOptions{
Model: "aura-asteria-en",
}
// STEP 4: send/process the desired text to Deepgram to convert to Speech
res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
if err != nil {
fmt.Printf("FromStream failed. Err: %v\n", err)
os.Exit(1)
}
// STEP 5: Your result struct/JSON
data, err := json.Marshal(res)
if err != nil {
fmt.Printf("json.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
// make the JSON pretty
prettyJSON, err := prettyjson.Format(data)
if err != nil {
fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
fmt.Printf("\n\nResult:\n%s\n\n", prettyJSON)
}
Chunk Text Source Payload
This example shows how to chunk the text source by sentence boundaries and stream the audio for each chunk consecutively:
package main
import (
"context"
"encoding/json"
"fmt"
"os"
prettyjson "github.com/hokaccha/go-prettyjson"
speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)
const (
textToSpeech string = "Hello, World!"
filePath string = "./test.mp3"
)
func main() {
// STEP 1: Initialize the library
client.InitWithDefault()
// Go context
ctx := context.Background()
// STEP 2: Create a Deepgram client.
// By default, the DEEPGRAM_API_KEY environment variable will be used for the API Key
c := client.NewWithDefaults()
dg := speak.New(c)
// STEP 3: Configure the options (such as model choice, audio configuration, etc.)
options := &interfaces.SpeakOptions{
Model: "aura-asteria-en",
}
// STEP 4: send/process the desired text to Deepgram to convert to Speech
res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
if err != nil {
fmt.Printf("FromStream failed. Err: %v\n", err)
os.Exit(1)
}
// STEP 5: Your result struct/JSON
data, err := json.Marshal(res)
if err != nil {
fmt.Printf("json.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
// make the JSON pretty
prettyJSON, err := prettyjson.Format(data)
if err != nil {
fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
os.Exit(1)
}
fmt.Printf("\n\nResult:\n%s\n\n", prettyJSON)
}
Where to Find Additional Examples
The SDK repository has a good collection of text-to-speech examples. You can find the links to the examples in the README.
Each example will attempt to provide different options on how you might transcribe a text-to-speech source.
Updated 2 months ago