Use Deepgram's speech-to-text API to transcribe live-streaming audio.

Deepgram provides its customers with real-time, streaming transcription via its streaming endpoints. These endpoints are high-performance, full-duplex services running over the WebSocket protocol.

Connecting to the Endpoint

To use this endpoint, connect to wss:// TLS encryption will protect your connection and data. We support a minimum of TLS 1.2.

Sending Audio Data

All audio data is sent to the streaming endpoint as binary-type WebSocket messages containing payloads that are the raw audio data. Because the protocol is full-duplex, you can stream in real-time and still receive transcription responses while uploading data. Streaming buffer sizes should be between 20 milliseconds and 250 milliseconds of audio.

Streaming KeepAlive

By default, the Deepgram streaming connection will time out with a NET-0001 error code if no audio is sent by the client for 12 seconds. (See Error Handling below for more information.)

To keep the websocket open without sending audio data, send the following JSON string:

{ "type": "KeepAlive" }

This will keep the streaming connection open for an additional 12 seconds. If no audio or additional KeepAlive messages are sent within the 12 second window, the streaming connection will close with a NET-0001. To avoid this error and keep the connection open, continue sending KeepAlive messages 3-5 seconds before the 12 second timeout window expires until you are ready to resume sending audio.


Read more about KeepAlive in this comprehensive guide.

Finalizing Audio Processing

In addition to closing the stream, Deepgram supports a Finalize message to handle specific scenarios where you need to force the server to process all unprocessed audio data and immediately return the results. This is particularly useful in situations where interim results need to be treated as final due to the end of an utterance, or when transitioning to a keep-alive period and you want to ensure no previous transcripts reappear unexpectedly.

To use this feature, send the following JSON message to the server:

{ "type": "Finalize" }

The server will process all remaining audio data and return the final results. In cases where there is a non-negligible amount of audio buffered in the server, you will receive a response with from_finalize set to true to indicate that the finalization process has completed.

Closing the Stream

When you are finished, send a JSON message to the server: { "type": "CloseStream" }. The server will interpret it as a shutdown command, which means it will finish processing whatever data is still has cached, send the response to the client, send a summary metadata object, and then terminate the WebSocket connection.

To learn more about working with real-time streaming data and results, see Get Started with Streaming Audio.

Deepgram does not store transcriptions. Make sure to save output or return transcriptions to a callback URL for custom processing.

Query Params


Callback URL to provide if you would like your submitted audio to be processed asynchronously. Learn More


Enable a callback method. Default: false Learn More


Number of independent audio channels contained in submitted streaming audio. Only read when a value is provided for encoding. Learn More


Dictation automatically formats spoken commands for punctuation into their respective punctuation marks. Learn More


Indicates whether to recognize speaker changes. When set to true, each word in the transcript will be assigned a speaker number starting at 0. Learn More


Indicates the version of the diarization feature to use. Only used when the diarization feature is enabled (diarize=true is passed to the API). Learn More


Expected encoding of the submitted streaming audio. If this parameter is set, sample_rate must also be specified. Learn More


Indicates how long Deepgram will wait to detect whether a speaker has finished speaking (or paused for a significant period of time, indicating the completion of an idea). When Deepgram detects an endpoint, it assumes that no additional data will improve its prediction, so it immediately finalizes the result for the processed time range and returns the transcript with a speech_final parameter set to true. Endpointing may be disabled by setting endpointing=false. Learn More


To add an extra parameter in the query string and pass a key-value pair you would like to include in the response. Learn More


Indicates whether to include filler words like "uh" and "um" in transcript output. When set to true, these words will be included. Defaults to false. Learn More


Indicates whether the streaming endpoint should send you updates to its transcription as more audio becomes available. When set to true, the streaming endpoint returns regular updates, which means transcription results will likely change for a period of time. By default, this flag is set to false. Learn More


Uncommon proper nouns or other words to transcribe that are not a part of the model's vocabulary. Can send multiple instances in query string (for example, keywords=snuffalupagus:10&keywords=systrom:5.5). Learn More


The BCP-47 language tag that hints at the primary spoken language. Learn More


AI model used to process submitted audio. Learn More


Indicates whether to transcribe each audio channel independently. Learn More


Indicates whether to convert numbers from written format (e.g., one) to numerical format (e.g., 1). Learn More


Indicates whether to remove profanity from the transcript. Learn More


Indicates whether to add punctuation and capitalization to the transcript Learn More


Indicates whether to redact sensitive information, replacing redacted content with asterisks (*). Can send multiple instances in query string (for example, redact=pci&redact=numbers). Learn More


Terms or phrases to search for in the submitted audio and replace. Can send multiple instances in query string (for example, replace=this:that&replace=thisalso:thatalso). Learn More


Sample rate of submitted streaming audio. Required (and only read) when a value is provided for encoding. Learn More


Terms or phrases to search for in the submitted audio. Can send multiple instances in query string (for example, search=speech&search=Friday). Learn More


Indicates whether to apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability. Learn More


Tag to associate with the request. Learn More


Indicates how long Deepgram will wait to send a {"type": "UtteranceEnd"} message after a word has been transcribed. Learn More


Indicates that speech has started. {"type": "SpeechStarted"} You'll begin receiving messages upon speech starting. Learn More


Version of the model to use.Learn More


Status Description
200 Success Audio submitted for transcription.

Response Schema

Error Handling

If Deepgram encounters an error during real-time streaming, we will return a WebSocket Close frame (WebSocket Protocol specification, section 5.5.1]).

The body of the Close frame will indicate the reason for closing using one of the specification’s pre-defined status codes followed by a UTF-8-encoded payload that represents the reason for the error. Current codes and payloads in use include:

Code Payload Description
1008 DATA-0000 The payload cannot be decoded as audio. Either the encoding is incorrectly specified, the payload is not audio data, or the audio is in a format unsupported by Deepgram.
1011 NET-0000 The service has not transmitted a Text frame to the client within the timeout window. This may indicate an issue internally in Deepgram's systems or could be due to Deepgram not receiving enough audio data to transcribe a frame.
1011 NET-0001 The service has not received a Binary or Text frame from the client within the timeout window. This may indicate an internal issue in Deepgram's systems, the client's systems, or the network connecting them.

To learn about debugging WebSocket errors, see Troubleshooting WebSocket DATA and NET Errors When Live Streaming Audio.

After sending a Close message, the endpoint considers the WebSocket connection closed and will close the underlying TCP connection.