Continuous Text Stream

Convert text into natural-sounding speech using Deepgram’s TTS WebSocket

HandshakeTry it

WSS
/v1/speak

Headers

AuthorizationstringRequired

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

Example: token %DEEPGRAM_API_KEY% or bearer %DEEPGRAM_TOKEN%

Query parameters

encodingenumOptionalDefaults to linear16

Encoding allows you to specify the expected encoding of your audio output for streaming TTS. Only streaming-compatible encodings are supported.

Allowed values:
mip_opt_outanyOptional
Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip
modelenumOptionalDefaults to aura-asteria-en
AI model used to process submitted text
sample_rateenumOptionalDefaults to 24000
Sample Rate specifies the sample rate for the output audio. Based on encoding 8000 or 24000 are possible defaults. For some encodings sample rate is not configurable.
Allowed values:

Send

SpeakV1TextanyRequired
Text to convert to audio
OR
SpeakV1FlushanyRequired
Flush the buffer and receive the final audio for text sent so far
OR
SpeakV1ClearanyRequired
Clear the buffer and start a new audio generation. Potentially destructive operation for any text in the buffer
OR
SpeakV1CloseanyRequired
Flush the buffer and close the connection gracefully after all audio is generated

Receive

SpeakV1AudioanyRequired
Receive audio chunks as they are generated
OR
SpeakV1MetadataanyRequired
Receive metadata about the audio generation
OR
SpeakV1FlushedanyRequired
Receive metadata about the audio generation
OR
SpeakV1ClearedanyRequired
Receive metadata about the audio generation
OR
SpeakV1WarninganyRequired
Receive a warning about the audio generation