JavaScript SDK

Framework-agnostic client for connecting to Deepgram’s Voice Agent API from any browser JavaScript environment.

JavaScript SDK

Framework-agnostic client for connecting to Deepgram’s Voice Agent API from any browser JavaScript environment.

This page covers the core JavaScript SDK, which works with vanilla JS, Vue, Svelte, Angular, and any other framework. If you are using React, see React Hooks and Provider. For a drop-in embeddable solution, see Widget.

Installation

$ npm install @deepgram/agents

Usage

Connect to a Deepgram voice agent, capture microphone audio, and play back the agent’s speech:

1 import { AgentSession, AgentMicrophone, AgentPlayer } from "@deepgram/agents";
2 
3 const session = new AgentSession({
4   auth: {
5     tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
6   },
7   agent: "YOUR_AGENT_ID",
8 });
9 
10 const player = new AgentPlayer();
11 const mic = new AgentMicrophone((data) => session.sendAudio(data));
12 
13 // Play agent audio
14 session.on("audio", (chunk) => player.queue(chunk));
15 
16 // Interrupt playback when the user speaks (barge-in)
17 session.on("user-started-speaking", () => player.interrupt());
18 
19 // Log conversation turns
20 session.on("conversation-text", (msg) => {
21   console.log(`${msg.role}: ${msg.content}`);
22 });
23 
24 await session.connect();
25 await mic.start();

Replace YOUR_AGENT_ID with a Reusable Agent Configuration UUID, or pass an inline agent config object instead. See Agent Configuration for both patterns.

AgentSession

The core class that manages the WebSocket connection to the Voice Agent API. It handles authentication, automatic reconnection with exponential backoff, keep-alive pings, and audio buffering before the server acknowledges settings.

1 import { AgentSession } from "@deepgram/agents";
2 
3 const session = new AgentSession(config);

Configuration

1 const session = new AgentSession({
2   // Required — authentication
3   auth: {
4     tokenFactory: () => fetch("/api/token").then((r) => r.text()),
5   },
6 
7   // Required — agent ID or inline settings
8   agent: "YOUR_AGENT_ID",
9 
10   // Optional — audio format
11   audio: {
12     input: { encoding: "linear16", sampleRate: 16_000 },
13     output: { encoding: "linear16", sampleRate: 24_000 },
14   },
15 
16   // Optional — reconnection behavior
17   reconnect: {
18     enabled: true,
19     maxAttempts: 8,
20     baseDelay: 500,
21     maxDelay: 30_000,
22     jitter: true,
23   },
24 
25   // Optional — keep-alive ping interval (ms)
26   keepAliveInterval: 10_000,
27 
28   // Optional — custom WebSocket URL (for proxies)
29   url: undefined,
30 });

Auth options:

Option	Type	Description
`tokenFactory`	`() => Promise<string>`	Returns a short-lived bearer token from your backend. Called at connect-time and before every reconnect. Tokens are cached internally until near expiry.
`apiKey`	`string`	Raw Deepgram API key. Server-side or local development only. Never expose in production browser bundles.

Agent options:

The agent field accepts either a string agent ID from the Deepgram Console or an inline AgentSettingsObject with listen, think, and speak configuration.

Audio options:

Option	Default	Description
`input.encoding`	`"linear16"`	Audio encoding sent to the server. Supported: `linear16`, `linear32`, `flac`, `alaw`, `mulaw`, `amr-nb`, `amr-wb`, `opus`, `ogg-opus`, `speex`, `g729`.
`input.sampleRate`	`16000`	Input sample rate in Hz.
`output.encoding`	`"linear16"`	Audio encoding received from the server. Supported: `linear16`, `mulaw`, `alaw`.
`output.sampleRate`	`24000`	Output sample rate in Hz.

Reconnection options:

Option	Default	Description
`enabled`	`true`	Automatically reconnect on unexpected disconnections.
`maxAttempts`	`8`	Maximum number of reconnection attempts before giving up.
`baseDelay`	`500`	Initial delay in ms before the first retry.
`maxDelay`	`30000`	Maximum delay in ms between retries. Delays grow via exponential backoff.
`jitter`	`true`	Add randomization (plus or minus 20%) to each delay to avoid thundering herd.

Connection States

The state property reflects the current connection lifecycle:

1 session.state; // "idle" | "connecting" | "connected" | "reconnecting" | "disconnected"

idle — session created but connect() has not been called.
connecting — WebSocket connection attempt in progress.
connected — WebSocket is open and the session is active.
reconnecting — connection lost; attempting to reconnect with backoff.
disconnected — connection closed and no further reconnection attempts will be made.

Methods

connect(): Promise<void> — establish the WebSocket connection to the agent. Resolves when the socket is open. The SDK automatically sends a Settings message after receiving the Welcome event.

1 await session.connect();

disconnect(): void — close the connection and cancel any pending reconnection attempts.

1 session.disconnect();

sendAudio(data: ArrayBuffer): void — send a PCM audio frame to the agent. Frames sent before the server fires settings-applied are queued internally and flushed automatically once the agent is ready. This prevents dropped frames during connection setup.

1 session.sendAudio(audioBuffer);

injectUserMessage(content: string): void — inject a text message into the conversation as the user.

1 session.injectUserMessage("What's the weather like?");

injectAgentMessage(message: string): void — inject a text message into the conversation as the agent.

1 session.injectAgentMessage("Let me look that up for you.");

updatePrompt(prompt: string): void — change the agent’s system prompt mid-session.

1 session.updatePrompt("You are now a customer support agent.");

updateSpeak(settings): void — change TTS settings mid-session. Accepts a SpeakSettings object or an array.

1 session.updateSpeak({
2   provider: { type: "deepgram", model: "aura-2-orion-en" },
3 });

updateThink(settings): void — change LLM settings mid-session. Accepts a ThinkSettings object or an array.

1 session.updateThink({
2   provider: { type: "open_ai", model: "gpt-4o" },
3 });

sendFunctionCallResponse(id, name, content): void — respond to a function call request from the agent. See Function Calls for a complete example.

1 session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));

getId(): string | null — returns the session ID assigned by the server. Available after the welcome event fires. Returns null before connection.

1 const sessionId = session.getId();

Events

Subscribe with session.on(event, callback) and unsubscribe with session.off(event, callback).

Connection lifecycle:

Event	Callback signature	Description
`connecting`	`() => void`	WebSocket connection attempt started.
`connected`	`() => void`	WebSocket connection established.
`reconnecting`	`(attempt: number, delayMs: number) => void`	Attempting reconnection. Includes the attempt number and delay before the next retry.
`disconnected`	`(reason: string) => void`	Connection closed. The reason string describes why.

Agent protocol:

Event	Callback signature	Description
`welcome`	`(msg) => void`	Session initialized by the server. Payload includes `session_id`.
`settings-applied`	`(msg) => void`	Server acknowledged settings. The agent is ready to receive audio.
`conversation-text`	`(msg) => void`	A conversation turn was transcribed. Payload: `{ role: "user" \| "assistant", content: string }`.
`user-started-speaking`	`(msg) => void`	Server detected user speech.
`agent-thinking`	`(msg) => void`	Agent is processing a response.
`agent-started-speaking`	`(msg) => void`	Agent began sending TTS audio.
`agent-audio-done`	`(msg) => void`	Agent finished sending audio. Note: browser playback may still be in progress.
`function-call-request`	`(msg) => void`	Agent requests one or more client-side function calls. Payload includes a `functions` array.
`function-call-response`	`(msg) => void`	Server acknowledged a function call response.
`prompt-updated`	`(msg) => void`	System prompt was changed successfully.
`speak-updated`	`(msg) => void`	TTS settings were changed successfully.
`think-updated`	`(msg) => void`	LLM settings were changed successfully.
`injection-refused`	`(msg) => void`	A message injection was rejected by the server.

Audio:

Event	Callback signature	Description
`audio`	`(chunk: ArrayBuffer) => void`	Raw PCM audio buffer from the agent. Pass to `AgentPlayer.queue()`.

Errors:

Event	Callback signature	Description
`error`	`(msg) => void`	Server-side agent error.
`warning`	`(msg) => void`	Server-side warning.
`sdk-error`	`(err: Error) => void`	Client-side SDK error (connection failures, timeouts, etc.).

AgentMicrophone

Captures PCM audio from the user’s microphone using the Web Audio API. Optionally integrates Silero VAD for voice activity detection so audio is only transmitted during speech.

1 import { AgentMicrophone } from "@deepgram/agents";
2 
3 const mic = new AgentMicrophone(
4   (data) => session.sendAudio(data),
5   {
6     sampleRate: 16_000,
7     echoCancellation: true,
8     noiseSuppression: true,
9     autoGainControl: true,
10   }
11 );
12 
13 await mic.start();

The first argument is a callback invoked with each captured audio frame as an ArrayBuffer. Pass session.sendAudio to stream audio directly to the agent.

Options

Option	Type	Default	Description
`sampleRate`	`number`	`16000`	Target sample rate in Hz for PCM capture.
`echoCancellation`	`boolean`	`true`	Enable browser echo cancellation via `getUserMedia`.
`noiseSuppression`	`boolean`	`true`	Enable browser noise suppression via `getUserMedia`.
`autoGainControl`	`boolean`	`true`	Enable browser auto gain control via `getUserMedia`.
`vad`	`boolean \| VadOptions`	`false`	Enable Silero voice activity detection. Pass `true` for defaults, or an options object.

VAD options (when vad is an object):

Option	Default	Description
`speechThreshold`	`0.5`	Probability threshold (0—1) above which audio is classified as speech.
`silenceThreshold`	`0.35`	Probability threshold (0—1) below which audio is classified as silence.

VAD requires the optional peer dependencies @ricky0123/vad-web and onnxruntime-web. Install them separately: npm install @ricky0123/vad-web onnxruntime-web.

Methods

start(): Promise<void> — request microphone permission via getUserMedia and begin capturing audio frames.

1 await mic.start();

stop(): void — stop capturing audio, disconnect the audio worklet, and release the microphone device.

1 mic.stop();

mute(): void — pause audio transmission without releasing the microphone. The device remains active but frames are not forwarded.

1 mic.mute();

unmute(): void — resume audio transmission after a mute() call.

1 mic.unmute();

muted: boolean — read-only property indicating whether the microphone is currently muted.

getInputVolume(): number — returns the current RMS input volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.

1 function animate() {
2   const volume = mic.getInputVolume();
3   drawMeter(volume);
4   requestAnimationFrame(animate);
5 }

getInputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum or waveform visualizations.

Events

Subscribe with mic.on(event, callback) and unsubscribe with mic.off(event, callback).

Event	Callback signature	Description
`speech-start`	`() => void`	VAD detected the user started speaking. Only fires when VAD is enabled.
`speech-end`	`() => void`	VAD detected the user stopped speaking. Only fires when VAD is enabled.
`audio-frame`	`(data: ArrayBuffer) => void`	A raw audio frame was captured.
`error`	`(err: Error) => void`	Microphone error (permission denied, device lost, etc.).

AgentPlayer

Decodes and plays PCM audio received from the agent. Maintains a playback queue with interrupt support for barge-in, and exposes volume and frequency analysis APIs for building custom visualizations.

1 import { AgentPlayer } from "@deepgram/agents";
2 
3 const player = new AgentPlayer({ sampleRate: 24_000 });
4 
5 session.on("audio", (chunk) => player.queue(chunk));
6 session.on("user-started-speaking", () => player.interrupt());

Options

Option	Type	Default	Description
`sampleRate`	`number`	`24000`	Expected sample rate of audio received from the agent. Must match `audio.output.sampleRate` in the session configuration.

Methods

queue(data: ArrayBuffer): void — decode a PCM audio buffer and add it to the playback queue. Buffers are played in order with sample-accurate scheduling.

interrupt(): void — immediately stop playback and discard all queued audio. Use when the user starts speaking to enable barge-in.

getRemainingPlaybackTime(): number — returns the number of seconds of audio still queued for playback. Returns 0 when idle. Useful for delaying mode transitions until the agent finishes speaking.

mute(): void — stop playback and discard queued audio. Subsequent calls to queue() are silently dropped until unmute() is called.

unmute(): void — resume accepting and playing queued audio.

muted: boolean — read-only property indicating whether the player is currently muted.

setVolume(volume: number): void — set the playback volume as a value between 0 and 1. Default: 1.

1 player.setVolume(0.5);

volume: number — read-only property returning the current playback volume (0—1).

getOutputVolume(): number — returns the current RMS output volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.

getOutputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum visualizations of the agent’s speech.

dispose(): void — close the underlying AudioContext and free resources. Call when the session is no longer needed.

Function Calls

The agent can request client-side function calls during a conversation. Listen for the function-call-request event, execute the requested function, and respond with the result using sendFunctionCallResponse.

1 session.on("function-call-request", async (msg) => {
2   for (const fn of msg.functions) {
3     let result;
4 
5     switch (fn.name) {
6       case "get_weather":
7         result = await fetchWeather(JSON.parse(fn.input));
8         break;
9       case "book_appointment":
10         result = await bookAppointment(JSON.parse(fn.input));
11         break;
12       default:
13         result = { error: `Unknown function: ${fn.name}` };
14     }
15 
16     session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));
17   }
18 });

Each item in the functions array contains:

Field	Type	Description
`id`	`string`	Unique identifier for this function call. Pass back in the response.
`name`	`string`	The function name the agent wants to invoke.
`input`	`string`	JSON-encoded arguments for the function.

The agent pauses until it receives a response for each requested function call.

Token Caching

The SDK caches authentication tokens internally. When you provide a tokenFactory in the auth configuration, the AgentSession wraps it in a caching layer that avoids requesting a new token on every reconnect. Tokens are cached for 4 minutes by default, which is safe for Deepgram’s 5-minute short-lived keys. The cache is automatically invalidated before each reconnection attempt to ensure a fresh token is available.

1 const session = new AgentSession({
2   auth: {
3     // This function is only called when the cache is empty or expired
4     tokenFactory: async () => {
5       const res = await fetch("/api/deepgram-token");
6       return res.text();
7     },
8   },
9   agent: "YOUR_AGENT_ID",
10 });

No additional setup is required. The caching behavior is automatic and transparent.

Installation

$ npm install @deepgram/agents

Usage

Connect to a Deepgram voice agent, capture microphone audio, and play back the agent’s speech:

1 import { AgentSession, AgentMicrophone, AgentPlayer } from "@deepgram/agents";
2 
3 const session = new AgentSession({
4   auth: {
5     tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
6   },
7   agent: "YOUR_AGENT_ID",
8 });
9 
10 const player = new AgentPlayer();
11 const mic = new AgentMicrophone((data) => session.sendAudio(data));
12 
13 // Play agent audio
14 session.on("audio", (chunk) => player.queue(chunk));
15 
16 // Interrupt playback when the user speaks (barge-in)
17 session.on("user-started-speaking", () => player.interrupt());
18 
19 // Log conversation turns
20 session.on("conversation-text", (msg) => {
21   console.log(`${msg.role}: ${msg.content}`);
22 });
23 
24 await session.connect();
25 await mic.start();

Replace YOUR_AGENT_ID with a Reusable Agent Configuration UUID, or pass an inline agent config object instead. See Agent Configuration for both patterns.

AgentSession

1 import { AgentSession } from "@deepgram/agents";
2 
3 const session = new AgentSession(config);

Configuration

1 const session = new AgentSession({
2   // Required — authentication
3   auth: {
4     tokenFactory: () => fetch("/api/token").then((r) => r.text()),
5   },
6 
7   // Required — agent ID or inline settings
8   agent: "YOUR_AGENT_ID",
9 
10   // Optional — audio format
11   audio: {
12     input: { encoding: "linear16", sampleRate: 16_000 },
13     output: { encoding: "linear16", sampleRate: 24_000 },
14   },
15 
16   // Optional — reconnection behavior
17   reconnect: {
18     enabled: true,
19     maxAttempts: 8,
20     baseDelay: 500,
21     maxDelay: 30_000,
22     jitter: true,
23   },
24 
25   // Optional — keep-alive ping interval (ms)
26   keepAliveInterval: 10_000,
27 
28   // Optional — custom WebSocket URL (for proxies)
29   url: undefined,
30 });

Auth options:

Option	Type	Description
`tokenFactory`	`() => Promise<string>`	Returns a short-lived bearer token from your backend. Called at connect-time and before every reconnect. Tokens are cached internally until near expiry.
`apiKey`	`string`	Raw Deepgram API key. Server-side or local development only. Never expose in production browser bundles.

Agent options:

The agent field accepts either a string agent ID from the Deepgram Console or an inline AgentSettingsObject with listen, think, and speak configuration.

Audio options:

Option	Default	Description
`input.encoding`	`"linear16"`	Audio encoding sent to the server. Supported: `linear16`, `linear32`, `flac`, `alaw`, `mulaw`, `amr-nb`, `amr-wb`, `opus`, `ogg-opus`, `speex`, `g729`.
`input.sampleRate`	`16000`	Input sample rate in Hz.
`output.encoding`	`"linear16"`	Audio encoding received from the server. Supported: `linear16`, `mulaw`, `alaw`.
`output.sampleRate`	`24000`	Output sample rate in Hz.

Reconnection options:

Option	Default	Description
`enabled`	`true`	Automatically reconnect on unexpected disconnections.
`maxAttempts`	`8`	Maximum number of reconnection attempts before giving up.
`baseDelay`	`500`	Initial delay in ms before the first retry.
`maxDelay`	`30000`	Maximum delay in ms between retries. Delays grow via exponential backoff.
`jitter`	`true`	Add randomization (plus or minus 20%) to each delay to avoid thundering herd.

Connection States

The state property reflects the current connection lifecycle:

1 session.state; // "idle" | "connecting" | "connected" | "reconnecting" | "disconnected"

idle — session created but connect() has not been called.
connecting — WebSocket connection attempt in progress.
connected — WebSocket is open and the session is active.
reconnecting — connection lost; attempting to reconnect with backoff.
disconnected — connection closed and no further reconnection attempts will be made.

Methods

1 await session.connect();

disconnect(): void — close the connection and cancel any pending reconnection attempts.

1 session.disconnect();

1 session.sendAudio(audioBuffer);

injectUserMessage(content: string): void — inject a text message into the conversation as the user.

1 session.injectUserMessage("What's the weather like?");

injectAgentMessage(message: string): void — inject a text message into the conversation as the agent.

1 session.injectAgentMessage("Let me look that up for you.");

updatePrompt(prompt: string): void — change the agent’s system prompt mid-session.

1 session.updatePrompt("You are now a customer support agent.");

updateSpeak(settings): void — change TTS settings mid-session. Accepts a SpeakSettings object or an array.

1 session.updateSpeak({
2   provider: { type: "deepgram", model: "aura-2-orion-en" },
3 });

updateThink(settings): void — change LLM settings mid-session. Accepts a ThinkSettings object or an array.

1 session.updateThink({
2   provider: { type: "open_ai", model: "gpt-4o" },
3 });

sendFunctionCallResponse(id, name, content): void — respond to a function call request from the agent. See Function Calls for a complete example.

1 session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));

getId(): string | null — returns the session ID assigned by the server. Available after the welcome event fires. Returns null before connection.

1 const sessionId = session.getId();

Events

Subscribe with session.on(event, callback) and unsubscribe with session.off(event, callback).

Connection lifecycle:

Event	Callback signature	Description
`connecting`	`() => void`	WebSocket connection attempt started.
`connected`	`() => void`	WebSocket connection established.
`reconnecting`	`(attempt: number, delayMs: number) => void`	Attempting reconnection. Includes the attempt number and delay before the next retry.
`disconnected`	`(reason: string) => void`	Connection closed. The reason string describes why.

Agent protocol:

Event	Callback signature	Description
`welcome`	`(msg) => void`	Session initialized by the server. Payload includes `session_id`.
`settings-applied`	`(msg) => void`	Server acknowledged settings. The agent is ready to receive audio.
`conversation-text`	`(msg) => void`	A conversation turn was transcribed. Payload: `{ role: "user" \| "assistant", content: string }`.
`user-started-speaking`	`(msg) => void`	Server detected user speech.
`agent-thinking`	`(msg) => void`	Agent is processing a response.
`agent-started-speaking`	`(msg) => void`	Agent began sending TTS audio.
`agent-audio-done`	`(msg) => void`	Agent finished sending audio. Note: browser playback may still be in progress.
`function-call-request`	`(msg) => void`	Agent requests one or more client-side function calls. Payload includes a `functions` array.
`function-call-response`	`(msg) => void`	Server acknowledged a function call response.
`prompt-updated`	`(msg) => void`	System prompt was changed successfully.
`speak-updated`	`(msg) => void`	TTS settings were changed successfully.
`think-updated`	`(msg) => void`	LLM settings were changed successfully.
`injection-refused`	`(msg) => void`	A message injection was rejected by the server.

Audio:

Event	Callback signature	Description
`audio`	`(chunk: ArrayBuffer) => void`	Raw PCM audio buffer from the agent. Pass to `AgentPlayer.queue()`.

Errors:

Event	Callback signature	Description
`error`	`(msg) => void`	Server-side agent error.
`warning`	`(msg) => void`	Server-side warning.
`sdk-error`	`(err: Error) => void`	Client-side SDK error (connection failures, timeouts, etc.).

AgentMicrophone

Captures PCM audio from the user’s microphone using the Web Audio API. Optionally integrates Silero VAD for voice activity detection so audio is only transmitted during speech.

1 import { AgentMicrophone } from "@deepgram/agents";
2 
3 const mic = new AgentMicrophone(
4   (data) => session.sendAudio(data),
5   {
6     sampleRate: 16_000,
7     echoCancellation: true,
8     noiseSuppression: true,
9     autoGainControl: true,
10   }
11 );
12 
13 await mic.start();

The first argument is a callback invoked with each captured audio frame as an ArrayBuffer. Pass session.sendAudio to stream audio directly to the agent.

Options

Option	Type	Default	Description
`sampleRate`	`number`	`16000`	Target sample rate in Hz for PCM capture.
`echoCancellation`	`boolean`	`true`	Enable browser echo cancellation via `getUserMedia`.
`noiseSuppression`	`boolean`	`true`	Enable browser noise suppression via `getUserMedia`.
`autoGainControl`	`boolean`	`true`	Enable browser auto gain control via `getUserMedia`.
`vad`	`boolean \| VadOptions`	`false`	Enable Silero voice activity detection. Pass `true` for defaults, or an options object.

VAD options (when vad is an object):

Option	Default	Description
`speechThreshold`	`0.5`	Probability threshold (0—1) above which audio is classified as speech.
`silenceThreshold`	`0.35`	Probability threshold (0—1) below which audio is classified as silence.

VAD requires the optional peer dependencies @ricky0123/vad-web and onnxruntime-web. Install them separately: npm install @ricky0123/vad-web onnxruntime-web.

Methods

start(): Promise<void> — request microphone permission via getUserMedia and begin capturing audio frames.

1 await mic.start();

stop(): void — stop capturing audio, disconnect the audio worklet, and release the microphone device.

1 mic.stop();

mute(): void — pause audio transmission without releasing the microphone. The device remains active but frames are not forwarded.

1 mic.mute();

unmute(): void — resume audio transmission after a mute() call.

1 mic.unmute();

muted: boolean — read-only property indicating whether the microphone is currently muted.

getInputVolume(): number — returns the current RMS input volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.

1 function animate() {
2   const volume = mic.getInputVolume();
3   drawMeter(volume);
4   requestAnimationFrame(animate);
5 }

getInputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum or waveform visualizations.

Events

Subscribe with mic.on(event, callback) and unsubscribe with mic.off(event, callback).

Event	Callback signature	Description
`speech-start`	`() => void`	VAD detected the user started speaking. Only fires when VAD is enabled.
`speech-end`	`() => void`	VAD detected the user stopped speaking. Only fires when VAD is enabled.
`audio-frame`	`(data: ArrayBuffer) => void`	A raw audio frame was captured.
`error`	`(err: Error) => void`	Microphone error (permission denied, device lost, etc.).

AgentPlayer

Decodes and plays PCM audio received from the agent. Maintains a playback queue with interrupt support for barge-in, and exposes volume and frequency analysis APIs for building custom visualizations.

1 import { AgentPlayer } from "@deepgram/agents";
2 
3 const player = new AgentPlayer({ sampleRate: 24_000 });
4 
5 session.on("audio", (chunk) => player.queue(chunk));
6 session.on("user-started-speaking", () => player.interrupt());

Options

Option	Type	Default	Description
`sampleRate`	`number`	`24000`	Expected sample rate of audio received from the agent. Must match `audio.output.sampleRate` in the session configuration.

Methods

queue(data: ArrayBuffer): void — decode a PCM audio buffer and add it to the playback queue. Buffers are played in order with sample-accurate scheduling.

interrupt(): void — immediately stop playback and discard all queued audio. Use when the user starts speaking to enable barge-in.

mute(): void — stop playback and discard queued audio. Subsequent calls to queue() are silently dropped until unmute() is called.

unmute(): void — resume accepting and playing queued audio.

muted: boolean — read-only property indicating whether the player is currently muted.

setVolume(volume: number): void — set the playback volume as a value between 0 and 1. Default: 1.

1 player.setVolume(0.5);

volume: number — read-only property returning the current playback volume (0—1).

getOutputVolume(): number — returns the current RMS output volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.

getOutputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum visualizations of the agent’s speech.

dispose(): void — close the underlying AudioContext and free resources. Call when the session is no longer needed.

Function Calls

1 session.on("function-call-request", async (msg) => {
2   for (const fn of msg.functions) {
3     let result;
4 
5     switch (fn.name) {
6       case "get_weather":
7         result = await fetchWeather(JSON.parse(fn.input));
8         break;
9       case "book_appointment":
10         result = await bookAppointment(JSON.parse(fn.input));
11         break;
12       default:
13         result = { error: `Unknown function: ${fn.name}` };
14     }
15 
16     session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));
17   }
18 });

Each item in the functions array contains:

Field	Type	Description
`id`	`string`	Unique identifier for this function call. Pass back in the response.
`name`	`string`	The function name the agent wants to invoke.
`input`	`string`	JSON-encoded arguments for the function.

The agent pauses until it receives a response for each requested function call.

Token Caching

1 const session = new AgentSession({
2   auth: {
3     // This function is only called when the cache is empty or expired
4     tokenFactory: async () => {
5       const res = await fetch("/api/deepgram-token");
6       return res.text();
7     },
8   },
9   agent: "YOUR_AGENT_ID",
10 });

No additional setup is required. The caching behavior is automatic and transparent.

1	import { AgentSession, AgentMicrophone, AgentPlayer } from "@deepgram/agents";
2
3	const session = new AgentSession({
4	auth: {
5	tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
6	},
7	agent: "YOUR_AGENT_ID",
8	});
9
10	const player = new AgentPlayer();
11	const mic = new AgentMicrophone((data) => session.sendAudio(data));
12
13	// Play agent audio
14	session.on("audio", (chunk) => player.queue(chunk));
15
16	// Interrupt playback when the user speaks (barge-in)
17	session.on("user-started-speaking", () => player.interrupt());
18
19	// Log conversation turns
20	session.on("conversation-text", (msg) => {
21	console.log(`${msg.role}: ${msg.content}`);
22	});
23
24	await session.connect();
25	await mic.start();

1	import { AgentSession } from "@deepgram/agents";
2
3	const session = new AgentSession(config);

1	const session = new AgentSession({
2	// Required — authentication
3	auth: {
4	tokenFactory: () => fetch("/api/token").then((r) => r.text()),
5	},
6
7	// Required — agent ID or inline settings
8	agent: "YOUR_AGENT_ID",
9
10	// Optional — audio format
11	audio: {
12	input: { encoding: "linear16", sampleRate: 16_000 },
13	output: { encoding: "linear16", sampleRate: 24_000 },
14	},
15
16	// Optional — reconnection behavior
17	reconnect: {
18	enabled: true,
19	maxAttempts: 8,
20	baseDelay: 500,
21	maxDelay: 30_000,
22	jitter: true,
23	},
24
25	// Optional — keep-alive ping interval (ms)
26	keepAliveInterval: 10_000,
27
28	// Optional — custom WebSocket URL (for proxies)
29	url: undefined,
30	});

1	session.updateSpeak({
2	provider: { type: "deepgram", model: "aura-2-orion-en" },
3	});

1	session.updateThink({
2	provider: { type: "open_ai", model: "gpt-4o" },
3	});

1	import { AgentMicrophone } from "@deepgram/agents";
2
3	const mic = new AgentMicrophone(
4	(data) => session.sendAudio(data),
5	{
6	sampleRate: 16_000,
7	echoCancellation: true,
8	noiseSuppression: true,
9	autoGainControl: true,
10	}
11	);
12
13	await mic.start();

1	function animate() {
2	const volume = mic.getInputVolume();
3	drawMeter(volume);
4	requestAnimationFrame(animate);
5	}

1	import { AgentPlayer } from "@deepgram/agents";
2
3	const player = new AgentPlayer({ sampleRate: 24_000 });
4
5	session.on("audio", (chunk) => player.queue(chunk));
6	session.on("user-started-speaking", () => player.interrupt());

1	session.on("function-call-request", async (msg) => {
2	for (const fn of msg.functions) {
3	let result;
4
5	switch (fn.name) {
6	case "get_weather":
7	result = await fetchWeather(JSON.parse(fn.input));
8	break;
9	case "book_appointment":
10	result = await bookAppointment(JSON.parse(fn.input));
11	break;
12	default:
13	result = { error: `Unknown function: ${fn.name}` };
14	}
15
16	session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));
17	}
18	});

1	const session = new AgentSession({
2	auth: {
3	// This function is only called when the cache is empty or expired
4	tokenFactory: async () => {
5	const res = await fetch("/api/deepgram-token");
6	return res.text();
7	},
8	},
9	agent: "YOUR_AGENT_ID",
10	});