For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
  • Get Started
    • Overview
    • Build a Voice Agent
    • Feature Overview
    • Template Apps
  • Configure
    • Overview
    • STT Models
    • LLM Models
    • TTS Models
    • Media Inputs & Outputs
    • Prompting Voice Agents
    • Multilingual Voice Agents
    • Maintaining Context
    • Reusable Agent Configurations
  • Build
    • Multi-Agent Architecture
  • Connect
  • Controls
  • Optimize
    • Voice Agent TTS Controls
    • Message Flow
    • Audio & Playback
    • Audio Preprocessing & Barge-In
    • Adaptive Echo Cancellation
  • Resources
    • SDKs
    • UI Components
    • API Reference
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Speech to Text: Media Input Settings
  • Text to Speech: Media Output Settings
  • Learn More
Configure

Media Inputs & Outputs

Use different media inputs and outputs when using the Voice Agent API.
Was this page helpful?
Previous

Prompting Voice Agents

Write effective system prompts that shape how your voice agent behaves on a live call.
Next
Built with

Deepgram’s APIs provides robust support for both media input and output settings, enabling users to customize audio data processing and output generation to suit a variety of Voice Agent applications.

Speech to Text: Media Input Settings

Media input settings allow you to define the parameters for audio data submitted for processing. These settings help optimize the transcription process by specifying the characteristics of the audio data. Below is a summary of the available options for media input settings:

FeatureDescription
ChannelsSpecifies the number of audio channels in the input.
EncodingDefines the audio encoding format.
MultichannelAllows for the processing of multi-channel audio inputs.
Sample RateIndicates the sample rate of the audio data.

Text to Speech: Media Output Settings

Once the input audio is processed, Deepgram provides robust options for generating speech output tailored to your voice agent’s requirements. These settings enable customization of the synthesized audio or transcription results for downstream use.

FeatureDescription
EncodingSpecifies the desired format of the resulting text-to-speech audio output
Bit RateSpecifies the desired bitrate of the resulting text-to-speech audio output.
ContainerSpecifies the desired file format wrapper for the output audio generated through text-to-speech synthesis
Sample Ratespecifies the desired sample rate of the resulting text-to-speech audio output

Learn More

APISupported Inputs & Outputs
Speech to TextPlease refer to the Speech to Text Media Input Settings documentation for more information.
Text to SpeechPlease refer to the Text to Speech Media Output Settings documentation for more information.
Text to SpeechPlease refer to the Text to Speech Supported Audio Formats documentation for more information.