For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
  • Text-to-Speech REST
    • Getting Started
    • Feature Overview
    • Template Apps
  • Text-to-Speech Streaming
    • Getting Started
    • Feature Overview
    • Template Apps
  • Models and Languages
    • Voices and Languages
    • TTS Voice Controls
  • Media Output Settings
    • Encoding
    • Bit Rate
    • Container
    • Sample Rate
  • Results Processing
    • TTS Tagging
    • TTS Callback
    • Audio Output Streaming
  • Tips and Tricks
    • Real-Time TTS with WebSockets
    • Text Chunking for TTS
    • Formatting text for Aura-2
    • Handling Audio Issues in Text To Speech
    • Sending LLM Outputs to a WebSocket
    • Text Chunking for TTS REST Optimization
    • Text to Speech Latency
    • Text to Speech Prompting
    • TTS Troubleshooting WebSocket, NET, and DATA Errors
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Enable Feature
  • CURL Example
  • Query Parameters
  • Analyze Response
  • Response Headers Example
Media Output Settings

Container

Container specifies the file format wrapper for the output audio.
Was this page helpful?
Previous

Sample Rate

Sample Rate specifies the sample rate for the output audio.
Next
Built with

container string

Text to Speech Request Text to Speech Stream English Only

The Container feature allows users to specify the desired file format wrapper for the output audio generated through text-to-speech synthesis.

Choosing the appropriate container format for the audio output is essential for compatibility with different playback devices and applications. Container formats wrap the audio data along with metadata, enabling efficient storage and transmission.

The container format value must adhere to the Audio Format Combinations table. Select a value based on the encoding type and your use case. Based on encoding wav or ogg are possible defaults.

Enable Feature

To enable the Container feature, include the container parameter in the query string with the desired container format value.

Text
https://api.deepgram.com/v1/speak?encoding=linear16&container=wav

If you are using the TTS Web Socket container=none can be used in your request but no other audio formats are currently supported.

CURL Example

You can use the following cURL command in a terminal or your favorite API client to synthesize text into speech with a specific container.

WAV container format:

cURL
$curl --request POST \
> --url "https://api.deepgram.com/v1/speak?model=aura-2-thalia-en&encoding=linear16&container=wav" \
> --header "Authorization: Token DEEPGRAM_API_KEY" \
> --header 'Content-Type: application/json' \
> --data '{"text": "Hello, how can I help you today?"}' \
> --output container_wav_output.wav \
> --write-out "Time-to-First-Byte: %{time_starttransfer}s Time-to-Last-Byte: %{time_total}s\n" \
> --fail-with-body \
> --silent || echo "Request failed"

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Query Parameters

ParameterValueTypeDescription
containerSee list of supported audio format combinations in the Audio Format Combinations table.stringThe desired file format wrapper for the output audio.

When using VoIP (Voice over Internet Protocol), we recommend adding container=none to your request to prevent request header information being misinterpreted as audio, which can result in static or click sounds.

Analyze Response

Upon successful processing of the request, you will receive an audio file containing the synthesized text-to-speech output, along with response headers providing additional information.

The audio file is streamed back to you, so you may begin playback as soon as the first byte arrives. Read the guide Streaming Audio Outputs to learn how to begin playing the stream immediately versus waiting for the entire file to arrive.

Response Headers Example

http
HTTP/1.1 200 OK
< content-type: audio/mpeg
< dg-model-name: aura-2-thalia-en
< dg-model-uuid: e4979ab0-8475-4901-9d66-0a562a4949bb
< dg-char-count: 32
< dg-request-id: bf6fc5c7-8f84-479f-b70a-602cf5bf18f3
< transfer-encoding: chunked
< date: Thu, 29 Feb 2024 19:20:48 GMT

To see these response headers when making a CURL request, add -v or --verbose to your request.

This includes:

  • content-type: Specifies the media type of the resource, in this case, audio/mpeg, indicating the format of the audio file returned.
  • dg-request-id: A unique identifier for the request, useful for debugging and tracking purposes.
  • dg-model-uuid: The unique identifier of the model that processed the request.
  • dg-char-count: Indicates the number of characters that were in the input text for the text-to-speech process.
  • dg-model-name: The name of the model used to process the request.
  • transfer-encoding: Specifies the form of encoding used to safely transfer the payload to the recipient.
  • date: The date and time the response was sent.