Browser Agent SDK

Voice AI for the browser — from a single script tag to a fully custom React application.

The demo above requires a Deepgram account. Sign up free to try it, or keep reading to understand the architecture first.

Choose Your Approach

Start from how much control you need, not which package to install.

“I want a voice agent on my site in five minutes.” Use the Widget. One install, one function call. Pick from six layouts — sidebar, floating, inline, button, embedded, or orb. No framework required.

$npm install @deepgram/agents-widget
1import { init } from "@deepgram/agents-widget";
2
3init({
4 tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
5 agent: "YOUR_AGENT_ID",
6 layout: "floating",
7});

“I’m building a React app and want pre-built components.” Use React UI Components. Conversation view, animated orb, mic/speaker buttons, text input, and a waveform visualizer — all styled through CSS custom properties that work alongside your existing design system.

“I’m building a React app but want full control over the UI.” Use React Hooks. Provider context and focused hooks for state, conversation history, microphone control, audio playback, and client-side function calling. You build every pixel; the hooks manage every connection.

“I’m using Vue, Svelte, Angular, or vanilla JS.” Use the JavaScript SDK. The core AgentSession class is framework-agnostic. Pair it with AgentMicrophone for capture and AgentPlayer for playback. Wire the events into whatever UI layer you prefer.

Architecture

Four packages, each building on the one below it. Install only the layer you need — everything above comes with it.

@deepgram/agents-widget Drop-in widget (UMD + ESM, bundles Preact)
@deepgram/ui Pre-built React components + CSS
@deepgram/react React provider + hooks
@deepgram/agents Core WebSocket client, microphone, player

Every layer shares the same connection logic, audio pipeline, and event model. The difference is how much UI you want handled for you.

Each layer pulls in the layer below as a dependency, and re-exports the parts you need from above. Installing @deepgram/ui brings in @deepgram/react and @deepgram/agents automatically and re-exports the hooks, provider, and SDK types — you import everything you need from @deepgram/ui alone. The same pattern applies one layer down: @deepgram/react brings in @deepgram/agents and re-exports its types.

What You Get at Every Layer

  • Reconnection with exponential backoff and jitter. Configurable max attempts, base delay, and ceiling. Connections recover without user intervention.
  • Playback-aware mode tracking. The SDK knows when audio has actually finished playing in the browser — not just when the server finished sending it. Mode transitions from “speaking” to “listening” wait for the audio queue to drain.
  • Audio buffering before settings are applied. Microphone frames captured before the server acknowledges your configuration are queued and flushed automatically.
  • Voice Activity Detection. Optional Silero VAD runs client-side for precise speech endpoint detection.
  • KeepAlive pings. Automatic heartbeat prevents idle WebSocket disconnects. Interval is configurable.
  • Typed event emitter. Every server message — Welcome, ConversationText, AgentAudioDone, FunctionCallRequest, and more — has a typed event. Subscribe to exactly what you need.
  • Custom WebSocket URL support. Connect to proxied or self-hosted endpoints by overriding the default Deepgram URL.

What the React Layer Adds

  • Client-side function calling scoped to React component lifecycle. Register tool handlers with useAgentClientTool — they mount and unmount with the component. Dynamic tools are checked first, then the provider falls back to onFunctionCall.
  • Conversation state management. The useAgentConversation hook accumulates transcript events into a structured message array with roles, content, and IDs.
  • Mode awareness. The useAgentMode hook tracks whether the agent is idle, listening, or speaking — with the playback-aware delay described above.

What the UI Layer Adds

  • 26 CSS custom properties with light-dark() adaptive defaults. Works with Tailwind, CSS Modules, plain CSS — anything that can set a custom property.
  • Canvas 2D visualizations. The orb and waveform components use Canvas 2D, not WebGL. They render well on low-power devices and do not require GPU acceleration.
  • data-dg-* attribute selectors instead of class names. Your existing CSS framework cannot collide with component styles.

What the Widget Adds

  • Six layouts: sidebar, floating, inline, button, embedded, orb. Each adapts to its container and responds to the host page’s color scheme.
  • Preact runtime bundled internally. The UMD build is self-contained — no React dependency, no build tooling, no framework assumptions. The ESM build is also available for projects with a bundler.
  • Single init() call returns a teardown function. Mount and unmount cleanly in single-page applications without leaking event listeners or audio contexts.
  • Full theming support. Pass a theme object to init() or use CSS custom properties from the host page. The widget inherits light-dark() behavior automatically.

Authentication

Browser applications must never expose API keys to the client. Use a token factory — a function that returns a short-lived token from your server.

1const config = {
2 auth: {
3 tokenFactory: async () => {
4 const res = await fetch("/api/deepgram-token");
5 return res.text();
6 },
7 },
8 agent: "YOUR_AGENT_ID",
9};

The browser WebSocket constructor does not support custom headers. The SDK works around this by passing the token as a Sec-WebSocket-Protocol value — the only header browsers allow on a WebSocket handshake. This is handled internally; you just return a token string from your factory function.

The token factory is called before every connection and reconnection attempt. Tokens stay fresh even across network interruptions.

The apiKey option exists for local development only. Never ship it in client-side code.

Agent Configuration

Configure the agent by referencing one you created in the Deepgram Console, or define the full configuration inline:

1const config = {
2 agent: "YOUR_AGENT_ID",
3};

Agent IDs let you change behavior from the console without redeploying your application. Inline configuration gives you version control and the ability to construct prompts dynamically at runtime.

You can combine both approaches — reference an agent ID for base configuration and override specific settings inline. See the individual package guides for details.

Next Steps

GuideWhat you will learn
JavaScript SDKAgentSession, AgentMicrophone, and AgentPlayer — the core primitives for any framework.
React HooksAgentProvider, state hooks, conversation hooks, and client-side function calling.
React UI ComponentsPre-built conversation view, orb visualizer, mic/speaker buttons, and CSS theming.
WidgetEmbed a complete voice agent with one function call. Six layouts, full theming, no build step.