Entity Detection

Pre-recorded Streaming

Entity Detection

Deepgram’s Entity Detection feature identifies and extracts key entities from content in submitted audio and returns these entities in the JSON response. When Entity Detection is enabled, the Punctuation feature will be enabled by default.

šŸ“˜

This feature is available for English only (all available regions).

Use Cases

Some examples of uses for Entity Detection include:

  • Customers who want to improve Conversational AI and Voice Assistant by triggering particular workflows and responses based on identified name, address, location, and other key entities.
  • Customers who want to enhance customer service and user experience by extracting meaningful and relevant information about key entities such as a person, organization, email, and phone number.
  • Customers who want to derive meaningful and actionable insights from the audio data based on identified entities in conversations.

Enable Feature

To enable Entity Detection, when you call Deepgram’s API, add a detect_entities parameter set to true in the query string:

detect_entities=true&punctuate=true

šŸ“˜

When Entity Detection is enabled, Punctuation will also be enabled by default.

To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.

šŸ“˜

Be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?detect_entities=true&punctuate=true'

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
      }
    ]
  }

Let's look more closely at the alternatives object:

"alternatives":[
  {
    "transcript":"Welcome to the Ai show. I'm Scott Stephenson, cofounder of Deepgram...",
    "confidence":0.9816771,
    "words": [...],
    "entities":[
      {
        "label":"NAME",
        "value":" Scott Stephenson",
        "confidence":0.9999924,
        "start_word":6,
        "end_word":8
      },
      {
        "label":"ORG",
        "value":" Deepgram",
        "confidence":0.9999757,
        "start_word":10,
        "end_word":11
      },
      {
        "label": "CARDINAL_NUM",
        "value": "one",
        "confidence": 1,
        "start_word": 186,
        "end_word": 187
      },
      ...
    ]
  }
]

In this response, we see that each alternative contains:

  • transcript: Transcript for the audio being processed.
  • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
  • entities: Object containing the information about entities for the audio being processed.

And we see that each entities object contains:

  • label: Type of entity identified.
  • value: Text of entity identified.
  • confidence: Floating point value between 0 and 1 that indicates entity reliability. Larger values indicate higher confidence.
  • start_word: Location of the first character of the first word in the section of audio being inspected for entities.
  • end_word: Location of the first character of the last word in the section of audio being inspected for entities.

By default, Deepgram applies its base tier, general AI model, which is a good, general-purpose model for everyday situations. To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.

šŸ“˜

All entities are available in English.

Identifiable Entities

Entities that our API can identify include the following:

LabelDescriptionSpoken ExampleWritten Example
NAMEPerson names, including fictional charactersHenry CavillHenry Cavill
ORGOrganizations, including companies, government, and non-profit organizationsBank of SwitzerlandBank of Switzerland
ORG_PRODProducts of organizationsiPhoneiPhone
LOCATIONGeographic locations, including famous buildings and placesStatue of LibertyStatue of Liberty
LANGUAGEHuman-spoken languagesSpanishSpanish
GENDERGendersfemalefemale
EMAILEmail stringsduygu at deepgram dot com[email protected]
URLURL stringswww dot deepgram dot comwww.deepgram.com
CARDINAL_NUMCardinal numbers, including floating pointstwenty five25
ORDINAL_NUMOrdinal numberstwenty fifth25th
DATEDates, including weekdays, months, and yearsafter fifth Mayafter 5th May
TIMETimes of the daytwo pm14:00 pm
MONEYMoney entitiesfive thousand euro5000 Euro
QUANTITYAmounts with unitsfive grams5 g
PERCENTPercentagestwenty five percent25%
US_PHONE_NUMUS phone numbers, both international and domesticfive five five three oh seven two555-3072
PHONE_NUMGeneral phone numbersoh five three oh seven two one0 530 721
LOCATION_ADDRESSOpen address stringsone hundred chestnut street, Chicago, Illinois six oh six one one100 Chestnut Street, Chicago, IL 60611
LOCATION_CITYCity namesChicagoChicago
LOCATION_STATEState namesIllinoisIllinois
LOCATION_TOWNTown namesWorcesterWorcester
LOCATION_ZIPZIP stringssix six one one one66111
LOCATION_COUNTRYCountry namesCambodiaCambodia
ACCOUNT_NUMAccount numbersone two eight oh eight six128086
CARD_NUMCredit card numbersfour one one one one one one one one one one one one one one one4111111111111111
CVV_NUMCredit card verification valuesfive five five555
CARD_EXPIRY_DATECredit card expiration dateseleven twenty five11/25
SSN_NUMUS social security numbersone one one two two oh oh oh oh111-22-0000
COMMS_CHANRadio channel nameschannel elevenchannel 11
NATO_ALPHASpelling descriptions with letters and wordst as in Texast as in Texas
MULTIPLICATIVEMultiplicative expressionsthree timesthree times
SEQUENCESequences of numbers that don't fall into other numeric categoriesfive five seven5 5 7
EMOTIONNames of emotionshappiest customer everhappiest customer ever