Search | Deepgram's Docs

Try this feature out in our API Playground.

search string

Pre-recorded Streaming All available languages

Deepgram’s Search feature searches for terms or phrases by matching acoustic patterns in audio (which we have found is more accurate than matching for text patterns in transcripts) and returns results in the response JSON object. This allows you to accurately identify whether a phrase was uttered in submitted audio by letting Deepgram “hear” whether the phrase was uttered rather than by trying to look for sufficiently close matches in the text transcript.

Because the search feature is looking for phonetic matches, it works best on longer, multisyllabic terms, or even on short to medium-length phrases.

Enable Feature

To enable Search, when you call Deepgram’s API, add a search parameter in the querystring and set it to your chosen search term or phrase:

search=TERM_OR_PHRASE

You can include up to 50 search terms per request.

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

cURL

$ curl \
>   --request POST \
>   --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
>   --header 'Content-Type: audio/wav' \
>   --data-binary @youraudio.wav \
>   --url 'https://api.deepgram.com/v1/listen?search=TERM_OR_PHRASE'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Search for a Term

To search for a single term, send one instance of the search parameter in the query string when calling the API:

search=epistemology

Search for Multiple Terms

You can search for multiple terms individually:

search=epistemology&search=warwick

Search for a Phrase

You can search for a phrase. URL-encode the phrase when submitting it.

search=social%20epistemology

Analyze Response

The term “epistemology” in this audio file is sufficiently technical that our model may not transcribe it accurately, but our phonetic search will be able to find it.

In our terminal, we run the following cURL command:

cURL

$ curl \
>   --request POST \
>   --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
>   --header 'Content-Type: audio/wav' \
>   --data-binary @epistemology.wav \
>   --url 'https://api.deepgram.com/v1/listen?search=epistemology'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

JSON

1 {
2 	"metadata": {
3 		"transaction_key": "string",
4 		"request_id": "string",
5 		"sha256": "string",
6 		"created": "string",
7 		"duration": 0,
8 		"channels": 0
9 	},
10 	"results": {
11 		"channels": [
12 			{
13 				"search": [],
14 				"alternatives": []
15 			}
16 		]
17 	}
18 }

Let’s look more closely at the alternatives object:

JSON

1 ...
2 "alternatives":[
3   {
4     "transcript": "hello this is steve fuller i'm a professor of social epi at the university of warwick and the question before us today is what is a and why is it important epi is the branch philosophy that is concerned with the nature of knowledge",
5     "confidence": 0.9773828,
6     "words":[
7       {"word":"hello","start":1.2560788,"end":1.3358299,"confidence":0.9822957},
8         ...
9     ]
10   }
11 ]

The audio file contains multiple occurrences of the word “epistemology”. Nova 2 correctly transcribes epistemology each instance correctly.

Now, let’s take a look at the search object:

JSON

1 ...
2 "search": [
3   {
4     "query": "epistemology",
5     "hits": [
6       {"confidence":1,"start":3.9676142,"end":4.7651243,"snippet":"social epistemology"},
7       {"confidence":1,"start":13.805,"end":14.645,"snippet":"epistemology"},
8       {"confidence":0.9782986,"start":10.605,"end":11.565,"snippet":"is epistemology"},
9       {"confidence":0.21875,"start":8.513423,"end":9.625,"snippet":"before us today"},
10       {"confidence":0.074074,"start":15.245,"end":16.005,"snippet":"branch of philosophy"},
11       {"confidence":0.045138836,"start":5.1240044,"end":5.6822615,"snippet":"university of"},
12       {"confidence":0.0234375,"start":17.165,"end":18.115,"snippet":"nature of knowledge"},
13       {"confidence":0,"start":1.1763278,"end":1.8940871,"snippet":"hello this is steve"},
14       {"confidence":0,"start":5.6025105,"end":7.3969088,"snippet":"of warwick and"}
15     ]
16   }
17 ]
18 ...

In this response, we see that each search contains:

query: Word that has been requested.
hits: Object containing each time the search term was found in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

In this part of the response, notice:

we have received multiple hits for the word. The first two have high confidence of 1 and the third has a confidence of 97.8%.
the results contain the start and end times for each place in the audio where the model heard the word.
after the first three hits, there’s a steep decline in the model’s confidence that it heard the requested word, and none of these hits do in fact correspond to the requested word.