Text-to-Speech

An overview of the Deepgram Go SDK and Deepgram text-to-speech.

Installing the SDK

To begin using Deepgram's Text-to-Speech functionality, you need to install the Deepgram Go SDK in your existing project. You can do this using the following command:

go get github.com/deepgram/deepgram-go-sdk

Make a Deepgram Text-to-Speech Request

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"os"

	prettyjson "github.com/hokaccha/go-prettyjson"

	api "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1/rest"
	interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
	client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)

const (
	textToSpeech string = "Hello, World!"
	filePath     string = "./test.mp3"
)

func main() {
	// init library
	client.InitWithDefault()

	// Go context
	ctx := context.Background()

	// set the Transcription options
	options := &interfaces.SpeakOptions{
		Model: "aura-asteria-en",
	}

	// create a Deepgram client
	c := client.NewRESTWithDefaults()
	dg := api.New(c)

	// send/process file to Deepgram
	res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
	if err != nil {
		fmt.Printf("FromStream failed. Err: %v\n", err)
		os.Exit(1)
	}

	data, err := json.Marshal(res)
	if err != nil {
		fmt.Printf("json.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}

	// make the JSON pretty
	prettyJSON, err := prettyjson.Format(data)
	if err != nil {
		fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}
	fmt.Printf("\n\nResult:\n%s\n\n", prettyJSON)
}

Audio Output Streaming

Deepgram's TTS API allows you to start playing the audio as soon as the first byte is received. This section provides examples to help you stream the audio output efficiently.

Single Text Source Payload

The following example demonstrates how to stream the audio as soon as the first byte arrives for a single text source:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"os"

	prettyjson "github.com/hokaccha/go-prettyjson"

	speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
	interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
	client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)

const (
	textToSpeech string = "Hello, World!"
	filePath     string = "./test.mp3"
)

func main() {
  // STEP 1: Initialize the library
  client.InitWithDefault()

	// Go context
	ctx := context.Background()

  // STEP 2: Create a Deepgram client.
  // By default, the DEEPGRAM_API_KEY environment variable will be used for the API Key
	c := client.NewWithDefaults()
	dg := speak.New(c)
  
 	// STEP 3: Configure the options (such as model choice, audio configuration, etc.)
	options := &interfaces.SpeakOptions{
		Model: "aura-asteria-en",
	}

  // STEP 4: send/process the desired text to Deepgram to convert to Speech
	res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
	if err != nil {
		fmt.Printf("FromStream failed. Err: %v\n", err)
		os.Exit(1)
	}

  // STEP 5: Your result struct/JSON
	data, err := json.Marshal(res)
	if err != nil {
		fmt.Printf("json.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}

	// make the JSON pretty
	prettyJSON, err := prettyjson.Format(data)
	if err != nil {
		fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}
	fmt.Printf("\n\nResult:\n%s\n\n", prettyJSON)
}

Chunk Text Source Payload

This example shows how to chunk the text source by sentence boundaries and stream the audio for each chunk consecutively:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"os"

	prettyjson "github.com/hokaccha/go-prettyjson"

	speak "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1"
	interfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces"
	client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak"
)

const (
	textToSpeech string = "Hello, World!"
	filePath     string = "./test.mp3"
)

func main() {
  // STEP 1: Initialize the library
  client.InitWithDefault()

	// Go context
	ctx := context.Background()

  // STEP 2: Create a Deepgram client.
  // By default, the DEEPGRAM_API_KEY environment variable will be used for the API Key
	c := client.NewWithDefaults()
	dg := speak.New(c)
  
 	// STEP 3: Configure the options (such as model choice, audio configuration, etc.)
	options := &interfaces.SpeakOptions{
		Model: "aura-asteria-en",
	}

  // STEP 4: send/process the desired text to Deepgram to convert to Speech
	res, err := dg.ToSave(ctx, filePath, textToSpeech, options)
	if err != nil {
		fmt.Printf("FromStream failed. Err: %v\n", err)
		os.Exit(1)
	}

  // STEP 5: Your result struct/JSON
	data, err := json.Marshal(res)
	if err != nil {
		fmt.Printf("json.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}

	// make the JSON pretty
	prettyJSON, err := prettyjson.Format(data)
	if err != nil {
		fmt.Printf("prettyjson.Marshal failed. Err: %v\n", err)
		os.Exit(1)
	}
	fmt.Printf("\n\nResult:\n%s\n\n", prettyJSON)
}

Where to Find Additional Examples

The SDK repository has a good collection of text-to-speech examples. You can find the links to the examples in the README.

Each example will attempt to provide different options on how you might transcribe a text-to-speech source.