JavaScript SDK

Framework-agnostic client for connecting to Deepgram’s Voice Agent API from any browser JavaScript environment.

This page covers the core JavaScript SDK, which works with vanilla JS, Vue, Svelte, Angular, and any other framework. If you are using React, see React Hooks and Provider. For a drop-in embeddable solution, see Widget.

Installation

$npm install @deepgram/agents

Usage

Connect to a Deepgram voice agent, capture microphone audio, and play back the agent’s speech:

1import { AgentSession, AgentMicrophone, AgentPlayer } from "@deepgram/agents";
2
3const session = new AgentSession({
4 auth: {
5 tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
6 },
7 agent: "YOUR_AGENT_ID",
8});
9
10const player = new AgentPlayer();
11const mic = new AgentMicrophone((data) => session.sendAudio(data));
12
13// Play agent audio
14session.on("audio", (chunk) => player.queue(chunk));
15
16// Interrupt playback when the user speaks (barge-in)
17session.on("user-started-speaking", () => player.interrupt());
18
19// Log conversation turns
20session.on("conversation-text", (msg) => {
21 console.log(`${msg.role}: ${msg.content}`);
22});
23
24await session.connect();
25await mic.start();

Replace YOUR_AGENT_ID with a Reusable Agent Configuration UUID, or pass an inline agent config object instead. See Agent Configuration for both patterns.

AgentSession

The core class that manages the WebSocket connection to the Voice Agent API. It handles authentication, automatic reconnection with exponential backoff, keep-alive pings, and audio buffering before the server acknowledges settings.

1import { AgentSession } from "@deepgram/agents";
2
3const session = new AgentSession(config);

Configuration

1const session = new AgentSession({
2 // Required — authentication
3 auth: {
4 tokenFactory: () => fetch("/api/token").then((r) => r.text()),
5 },
6
7 // Required — agent ID or inline settings
8 agent: "YOUR_AGENT_ID",
9
10 // Optional — audio format
11 audio: {
12 input: { encoding: "linear16", sampleRate: 16_000 },
13 output: { encoding: "linear16", sampleRate: 24_000 },
14 },
15
16 // Optional — reconnection behavior
17 reconnect: {
18 enabled: true,
19 maxAttempts: 8,
20 baseDelay: 500,
21 maxDelay: 30_000,
22 jitter: true,
23 },
24
25 // Optional — keep-alive ping interval (ms)
26 keepAliveInterval: 10_000,
27
28 // Optional — custom WebSocket URL (for proxies)
29 url: undefined,
30});

Auth options:

OptionTypeDescription
tokenFactory() => Promise<string>Returns a short-lived bearer token from your backend. Called at connect-time and before every reconnect. Tokens are cached internally until near expiry.
apiKeystringRaw Deepgram API key. Server-side or local development only. Never expose in production browser bundles.

Agent options:

The agent field accepts either a string agent ID from the Deepgram Console or an inline AgentSettingsObject with listen, think, and speak configuration.

Audio options:

OptionDefaultDescription
input.encoding"linear16"Audio encoding sent to the server. Supported: linear16, linear32, flac, alaw, mulaw, amr-nb, amr-wb, opus, ogg-opus, speex, g729.
input.sampleRate16000Input sample rate in Hz.
output.encoding"linear16"Audio encoding received from the server. Supported: linear16, mulaw, alaw.
output.sampleRate24000Output sample rate in Hz.

Reconnection options:

OptionDefaultDescription
enabledtrueAutomatically reconnect on unexpected disconnections.
maxAttempts8Maximum number of reconnection attempts before giving up.
baseDelay500Initial delay in ms before the first retry.
maxDelay30000Maximum delay in ms between retries. Delays grow via exponential backoff.
jittertrueAdd randomization (plus or minus 20%) to each delay to avoid thundering herd.

Connection States

The state property reflects the current connection lifecycle:

1session.state; // "idle" | "connecting" | "connected" | "reconnecting" | "disconnected"
  • idle — session created but connect() has not been called.
  • connecting — WebSocket connection attempt in progress.
  • connected — WebSocket is open and the session is active.
  • reconnecting — connection lost; attempting to reconnect with backoff.
  • disconnected — connection closed and no further reconnection attempts will be made.

Methods

connect(): Promise<void> — establish the WebSocket connection to the agent. Resolves when the socket is open. The SDK automatically sends a Settings message after receiving the Welcome event.

1await session.connect();

disconnect(): void — close the connection and cancel any pending reconnection attempts.

1session.disconnect();

sendAudio(data: ArrayBuffer): void — send a PCM audio frame to the agent. Frames sent before the server fires settings-applied are queued internally and flushed automatically once the agent is ready. This prevents dropped frames during connection setup.

1session.sendAudio(audioBuffer);

injectUserMessage(content: string): void — inject a text message into the conversation as the user.

1session.injectUserMessage("What's the weather like?");

injectAgentMessage(message: string): void — inject a text message into the conversation as the agent.

1session.injectAgentMessage("Let me look that up for you.");

updatePrompt(prompt: string): void — change the agent’s system prompt mid-session.

1session.updatePrompt("You are now a customer support agent.");

updateSpeak(settings): void — change TTS settings mid-session. Accepts a SpeakSettings object or an array.

1session.updateSpeak({
2 provider: { type: "deepgram", model: "aura-2-orion-en" },
3});

updateThink(settings): void — change LLM settings mid-session. Accepts a ThinkSettings object or an array.

1session.updateThink({
2 provider: { type: "open_ai", model: "gpt-4o" },
3});

sendFunctionCallResponse(id, name, content): void — respond to a function call request from the agent. See Function Calls for a complete example.

1session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));

getId(): string | null — returns the session ID assigned by the server. Available after the welcome event fires. Returns null before connection.

1const sessionId = session.getId();

Events

Subscribe with session.on(event, callback) and unsubscribe with session.off(event, callback).

Connection lifecycle:

EventCallback signatureDescription
connecting() => voidWebSocket connection attempt started.
connected() => voidWebSocket connection established.
reconnecting(attempt: number, delayMs: number) => voidAttempting reconnection. Includes the attempt number and delay before the next retry.
disconnected(reason: string) => voidConnection closed. The reason string describes why.

Agent protocol:

EventCallback signatureDescription
welcome(msg) => voidSession initialized by the server. Payload includes session_id.
settings-applied(msg) => voidServer acknowledged settings. The agent is ready to receive audio.
conversation-text(msg) => voidA conversation turn was transcribed. Payload: { role: "user" | "assistant", content: string }.
user-started-speaking(msg) => voidServer detected user speech.
agent-thinking(msg) => voidAgent is processing a response.
agent-started-speaking(msg) => voidAgent began sending TTS audio.
agent-audio-done(msg) => voidAgent finished sending audio. Note: browser playback may still be in progress.
function-call-request(msg) => voidAgent requests one or more client-side function calls. Payload includes a functions array.
function-call-response(msg) => voidServer acknowledged a function call response.
prompt-updated(msg) => voidSystem prompt was changed successfully.
speak-updated(msg) => voidTTS settings were changed successfully.
think-updated(msg) => voidLLM settings were changed successfully.
injection-refused(msg) => voidA message injection was rejected by the server.

Audio:

EventCallback signatureDescription
audio(chunk: ArrayBuffer) => voidRaw PCM audio buffer from the agent. Pass to AgentPlayer.queue().

Errors:

EventCallback signatureDescription
error(msg) => voidServer-side agent error.
warning(msg) => voidServer-side warning.
sdk-error(err: Error) => voidClient-side SDK error (connection failures, timeouts, etc.).

AgentMicrophone

Captures PCM audio from the user’s microphone using the Web Audio API. Optionally integrates Silero VAD for voice activity detection so audio is only transmitted during speech.

1import { AgentMicrophone } from "@deepgram/agents";
2
3const mic = new AgentMicrophone(
4 (data) => session.sendAudio(data),
5 {
6 sampleRate: 16_000,
7 echoCancellation: true,
8 noiseSuppression: true,
9 autoGainControl: true,
10 }
11);
12
13await mic.start();

The first argument is a callback invoked with each captured audio frame as an ArrayBuffer. Pass session.sendAudio to stream audio directly to the agent.

Options

OptionTypeDefaultDescription
sampleRatenumber16000Target sample rate in Hz for PCM capture.
echoCancellationbooleantrueEnable browser echo cancellation via getUserMedia.
noiseSuppressionbooleantrueEnable browser noise suppression via getUserMedia.
autoGainControlbooleantrueEnable browser auto gain control via getUserMedia.
vadboolean | VadOptionsfalseEnable Silero voice activity detection. Pass true for defaults, or an options object.

VAD options (when vad is an object):

OptionDefaultDescription
speechThreshold0.5Probability threshold (0—1) above which audio is classified as speech.
silenceThreshold0.35Probability threshold (0—1) below which audio is classified as silence.

VAD requires the optional peer dependencies @ricky0123/vad-web and onnxruntime-web. Install them separately: npm install @ricky0123/vad-web onnxruntime-web.

Methods

start(): Promise<void> — request microphone permission via getUserMedia and begin capturing audio frames.

1await mic.start();

stop(): void — stop capturing audio, disconnect the audio worklet, and release the microphone device.

1mic.stop();

mute(): void — pause audio transmission without releasing the microphone. The device remains active but frames are not forwarded.

1mic.mute();

unmute(): void — resume audio transmission after a mute() call.

1mic.unmute();

muted: boolean — read-only property indicating whether the microphone is currently muted.

getInputVolume(): number — returns the current RMS input volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.

1function animate() {
2 const volume = mic.getInputVolume();
3 drawMeter(volume);
4 requestAnimationFrame(animate);
5}

getInputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum or waveform visualizations.

Events

Subscribe with mic.on(event, callback) and unsubscribe with mic.off(event, callback).

EventCallback signatureDescription
speech-start() => voidVAD detected the user started speaking. Only fires when VAD is enabled.
speech-end() => voidVAD detected the user stopped speaking. Only fires when VAD is enabled.
audio-frame(data: ArrayBuffer) => voidA raw audio frame was captured.
error(err: Error) => voidMicrophone error (permission denied, device lost, etc.).

AgentPlayer

Decodes and plays PCM audio received from the agent. Maintains a playback queue with interrupt support for barge-in, and exposes volume and frequency analysis APIs for building custom visualizations.

1import { AgentPlayer } from "@deepgram/agents";
2
3const player = new AgentPlayer({ sampleRate: 24_000 });
4
5session.on("audio", (chunk) => player.queue(chunk));
6session.on("user-started-speaking", () => player.interrupt());

Options

OptionTypeDefaultDescription
sampleRatenumber24000Expected sample rate of audio received from the agent. Must match audio.output.sampleRate in the session configuration.

Methods

queue(data: ArrayBuffer): void — decode a PCM audio buffer and add it to the playback queue. Buffers are played in order with sample-accurate scheduling.

interrupt(): void — immediately stop playback and discard all queued audio. Use when the user starts speaking to enable barge-in.

getRemainingPlaybackTime(): number — returns the number of seconds of audio still queued for playback. Returns 0 when idle. Useful for delaying mode transitions until the agent finishes speaking.

mute(): void — stop playback and discard queued audio. Subsequent calls to queue() are silently dropped until unmute() is called.

unmute(): void — resume accepting and playing queued audio.

muted: boolean — read-only property indicating whether the player is currently muted.

setVolume(volume: number): void — set the playback volume as a value between 0 and 1. Default: 1.

1player.setVolume(0.5);

volume: number — read-only property returning the current playback volume (0—1).

getOutputVolume(): number — returns the current RMS output volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.

getOutputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum visualizations of the agent’s speech.

dispose(): void — close the underlying AudioContext and free resources. Call when the session is no longer needed.

Function Calls

The agent can request client-side function calls during a conversation. Listen for the function-call-request event, execute the requested function, and respond with the result using sendFunctionCallResponse.

1session.on("function-call-request", async (msg) => {
2 for (const fn of msg.functions) {
3 let result;
4
5 switch (fn.name) {
6 case "get_weather":
7 result = await fetchWeather(JSON.parse(fn.input));
8 break;
9 case "book_appointment":
10 result = await bookAppointment(JSON.parse(fn.input));
11 break;
12 default:
13 result = { error: `Unknown function: ${fn.name}` };
14 }
15
16 session.sendFunctionCallResponse(fn.id, fn.name, JSON.stringify(result));
17 }
18});

Each item in the functions array contains:

FieldTypeDescription
idstringUnique identifier for this function call. Pass back in the response.
namestringThe function name the agent wants to invoke.
inputstringJSON-encoded arguments for the function.

The agent pauses until it receives a response for each requested function call.

Token Caching

The SDK caches authentication tokens internally. When you provide a tokenFactory in the auth configuration, the AgentSession wraps it in a caching layer that avoids requesting a new token on every reconnect. Tokens are cached for 4 minutes by default, which is safe for Deepgram’s 5-minute short-lived keys. The cache is automatically invalidated before each reconnection attempt to ensure a fresh token is available.

1const session = new AgentSession({
2 auth: {
3 // This function is only called when the cache is empty or expired
4 tokenFactory: async () => {
5 const res = await fetch("/api/deepgram-token");
6 return res.text();
7 },
8 },
9 agent: "YOUR_AGENT_ID",
10});

No additional setup is required. The caching behavior is automatic and transparent.