Agent Media Inputs & Outputs
Use different media inputs and outputs when using the Voice Agent API.
Deepgram's APIs provides robust support for both media input and output settings, enabling users to customize audio data processing and output generation to suit a variety of Voice Agent applications.
Speech to Text: Media Input Settings
Media input settings allow you to define the parameters for audio data submitted for processing. These settings help optimize the transcription process by specifying the characteristics of the audio data. Below is a summary of the available options for media input settings:
Feature | Description |
---|---|
Channels | Specifies the number of audio channels in the input. |
Encoding | Defines the audio encoding format. |
Multichannel | Allows for the processing of multi-channel audio inputs. |
Sample Rate | Indicates the sample rate of the audio data. |
Text to Speech: Media Output Settings
Once the input audio is processed, Deepgram provides robust options for generating speech output tailored to your voice agent's requirements. These settings enable customization of the synthesized audio or transcription results for downstream use.
Feature | Description |
---|---|
Encoding | Specifies the desired format of the resulting text-to-speech audio output |
Bit Rate | Specifies the desired bitrate of the resulting text-to-speech audio output. |
Container | Specifies the desired file format wrapper for the output audio generated through text-to-speech synthesis |
Sample Rate | specifies the desired sample rate of the resulting text-to-speech audio output |
Learn More
API | Supported Inputs & Outputs |
---|---|
Speech to Text | Please refer to the Speech to Text Media Input Settings documentation for more information. |
Text to Speech | Please refer to the Text to Speech Media Output Settings documentation for more information. |
Text to Speech | Please refer to the Text to Speech Supported Audio Formats documentation for more information. |
Updated about 7 hours ago