Keywords
Keywords can boost or suppress specialized terminology.
keywords
string
Try this feature out in our API Playground!
When using Deepgram’s API to transcribe audio with specialized terminology or uncommon proper nouns, you can provide those words to the model for it to incorporate as possible predictions. This is known as keyword boosting.
Keywords will not increase the likelihood that common words or proper nouns are predicted. Keywords are designed as a quick way to provide our models with new vocabulary they have not previously encountered.
Training a custom model is always the most effective and accurate way to recognize keywords and context in your transcripts. If you require more than 100 keywords, please contact us to discuss custom model training which is available on our Enterprise Plan.
Enable Feature
To enable Keywords, when you call Deepgram’s API, add a keywords
parameter in the query string and set it to your chosen keyword and intensifier:
keywords=KEYWORD:INTENSIFIER
To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.
curl \
--request POST \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'Content-Type: audio/wav' \
--data-binary @youraudio.wav \
--url 'https://api.deepgram.com/v1/listen?keywords=KEYWORD:INTENSIFIER'
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Keyword Limits
Keywords are limited to 100 keywords per request.
Intensifiers
Intensifiers are exponential factors by which the attached keyword can be boosted or suppressed. The default intensifier is one (1). An intensifier of two (2) equates to two boosts multiplied in a row, whereas zero (0) is equivalent to not specifying a keywords
parameter at all. The intensifier value can be a decimal.
There is no upper limit on intensifiers. However, the higher the intensifier is set, the more likely it is that the model will inaccurately predict the keyword when it has not been said. Intensifiers should be set at the lowest possible value. Higher values may occasionally need to be set to get Deepgram to successfully predict keywords.
Boost a Keyword
To use keyword boosting, send one or more instances of the keywords
parameter in the query string when calling the API and append a positive intensifier, which will boost the recognition of the specified word:
keywords=snuffleupagus:5
Truth | Before boosting | After boosting |
---|---|---|
and then big bird said to snuffleupagus why aren’t you eating that banana | and then big bird said to sniff why aren’t you eating that banana | and then big bird said to snuffleupagus why aren’t you eating that banana |
Boost Multiple Keywords
You can boost multiple keywords individually:
keywords=systrom&keywords=krieger
Source | Before boosting | After boosting |
---|---|---|
instagram is a photo and video sharing social networking service founded by kevin systrom and mike krieger | instagram is a photo and video sharing social networking service founded by kevin system and mike k | instagram is a photo and video sharing social networking service founded by kevin systrom and mike krieger |
Suppress a Keyword
Keyword suppression only works on Base models.
To use keyword suppression, send one or more instances of the keywords
parameter in the query string and append a negative intensifier, which will suppress the recognition of the specified word:
keywords=kansas:-10
Truth | Before suppression | After suppression |
---|---|---|
i was born in nineteen twenty one in wichita kansas | i was born in nineteen twenty one in wichita kansas | i was born in nineteen twenty one in wichita |
Best Practices
Deepgram's keyword boosting works when you have an uncommon word in your audio that is not able to be detected by Deepgram's models out of the box, and don’t have a custom model that has been trained to include that word.
While training a custom model is always the most effective way to achieve accuracy in your transcripts, keyword boosting can help your model recognize words it has yet to be trained on. Proper nouns, product names, and industry-specific terminology are great candidates for keyword boosting.
Send only individual keywords you want to see better represented in transcripts
For keyword boosting to work best:
- Send uncommon keywords that the model has not been able to successfully transcribe. Do not send common words, or words the model has been able to successfully transcribe.
- Send keywords once. Avoid sending the same word multiple times.
- Send individual keywords rather than phrases. Send only the principal uncommon words within phrases of interest.
- Send actual words. Avoid sending strings of numbers or alphanumerics.
Be moderate when choosing intensifiers
Remember that intensifiers are exponential factors and that intensifier values can include decimals. Although there is no maximum or minimum value, start with small increments and adjust as necessary. High intensifiers can cause false positives to appear in your transcript.
Spell proper nouns as you’d like them to appear in your transcript
The technology powering the keywords feature uses the same logic found in our search, so you don’t need to provide any “sounds like” spelling hints. If you want your keyword to be spelled a certain way in your transcript, spell it that way in the keywords
parameter. If you want it capitalized in your transcript, capitalize it in the keywords
parameter.
Avoid transferring over a list of keywords that has been used with a previous vendor
Every system is different. Whereas keyword boosting may have been necessary with another vendor, Deepgram's system may already perform well without the aid of keywords. Additional boosting may negatively affect results. Start without any keywords, add them as you see fit, and cautiously increase boosts using decimal values until you find the best fit for your data.
Updated about 2 months ago