Continuous Text Stream
Convert text into natural-sounding speech using Deepgram’s TTS WebSocket
HandshakeTry it
WSS
/v1/speak
Headers
Authorization
Use your API key for authentication, or alternatively generate a temporary token and pass it via the token
query parameter.
Example: token %DEEPGRAM_API_KEY%
or bearer %DEEPGRAM_TOKEN%
Query parameters
encoding
Encoding allows you to specify the expected encoding of your audio output for streaming TTS. Only streaming-compatible encodings are supported.
Allowed values:
mip_opt_out
Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip
model
AI model used to process submitted text
sample_rate
Sample Rate specifies the sample rate for the output audio. Based on encoding 8000 or 24000 are possible defaults. For some encodings sample rate is not configurable.
Allowed values:
Send
SpeakV1Text
Text to convert to audio
OR
SpeakV1Flush
Flush the buffer and receive the final audio for text sent so far
OR
SpeakV1Clear
Clear the buffer and start a new audio generation. Potentially destructive operation for any text in the buffer
OR
SpeakV1Close
Flush the buffer and close the connection gracefully after all audio is generated
Receive
SpeakV1Audio
Receive audio chunks as they are generated
OR
SpeakV1Metadata
Receive metadata about the audio generation
OR
SpeakV1Flushed
Receive metadata about the audio generation
OR
SpeakV1Cleared
Receive metadata about the audio generation
OR
SpeakV1Warning
Receive a warning about the audio generation