detect_entities boolean. Default: false
When Entity Detection is enabled, the Punctuation feature will be enabled by default.
Entity Detection is available for both pre-recorded and streaming speech-to-text.
Streaming: Entity Detection for streaming is supported on Nova, Nova-2, Nova-3, and Enhanced models. It is not available for Base models or Flux.
Pre-recorded: Entity Detection for pre-recorded audio is available on all models.
To enable Entity Detection, when you call Deepgram’s API, add a detect_entities parameter set to true in the query string:
detect_entities=true
When Entity Detection is enabled, Punctuation will also be enabled by default.
To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.
Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.
To enable Entity Detection for streaming audio, establish a WebSocket connection with the detect_entities=true parameter. Remember that streaming Entity Detection is supported on Nova, Nova-2, Nova-3, and Enhanced models.
The response structure differs between pre-recorded and streaming transcription.
When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:
Let’s look more closely at the alternatives object:
For streaming transcription, entities are included in final results only (when is_final: true). Interim results do not contain the entities array.
Here’s an example of a streaming response with Entity Detection enabled:
Streaming Behavior:
entities array is only present in final results (is_final: true).detect_entities is enabled but no entities are detected, an empty array is returned: "entities": [].When using Entity Detection with streaming audio, Deepgram will attempt to detect and format entities as they are spoken. For entities that seem like they may be incomplete, our system will:
This approach ensures transcripts are returned promptly while maintaining entity detection precision.
Setting no_delay=true forces immediate finalization of streaming transcripts without waiting for entity completion.
This will result in entities being missed or incomplete in many cases. Only use no_delay=true if low latency is more important than entity detection accuracy.
To use no_delay with Entity Detection:
Each entity object in the entities array contains the following fields:
label: Type of entity identified (e.g., NAME, PHONE_NUMBER, EMAIL, ADDRESS).value: The formatted text of the entity. When Smart Formatting is enabled, this field reflects the formatted output.raw_value: (When formatting is enabled) The original, non-formatted text as spoken. This field is included in both pre-recorded and streaming responses when formatting features (such as Smart Formatting) are enabled.confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.start_word: Index of the first word, inclusive, of the entity in the transcript.end_word: Index of the last word, exclusive, of the entity in the transcript.Key Differences Between Pre-recorded and Streaming:
View all options here: Supported Entity Types
Some examples of uses for Entity Detection include: