Media Inputs & Outputs | Deepgram's Docs

Deepgram’s APIs provides robust support for both media input and output settings, enabling users to customize audio data processing and output generation to suit a variety of Voice Agent applications.

Speech to Text: Media Input Settings

Media input settings allow you to define the parameters for audio data submitted for processing. These settings help optimize the transcription process by specifying the characteristics of the audio data. Below is a summary of the available options for media input settings:

Feature	Description
Channels	Specifies the number of audio channels in the input.
Encoding	Defines the audio encoding format.
Multichannel	Allows for the processing of multi-channel audio inputs.
Sample Rate	Indicates the sample rate of the audio data.

Text to Speech: Media Output Settings

Once the input audio is processed, Deepgram provides robust options for generating speech output tailored to your voice agent’s requirements. These settings enable customization of the synthesized audio or transcription results for downstream use.

Feature	Description
Encoding	Specifies the desired format of the resulting text-to-speech audio output
Bit Rate	Specifies the desired bitrate of the resulting text-to-speech audio output.
Container	Specifies the desired file format wrapper for the output audio generated through text-to-speech synthesis
Sample Rate	specifies the desired sample rate of the resulting text-to-speech audio output

Learn More

API	Supported Inputs & Outputs
Speech to Text	Please refer to the Speech to Text Media Input Settings documentation for more information.
Text to Speech	Please refer to the Text to Speech Media Output Settings documentation for more information.
Text to Speech	Please refer to the Text to Speech Supported Audio Formats documentation for more information.