Session Observability

Log client and server WebSocket messages for per-session, turn-by-turn observability into transcripts, agent behavior, function calls, errors, and latency.

The Voice Agent dashboard surfaces high-level usage, but it does not give you per-session, turn-by-turn observability. Everything you need is already flowing across the WebSocket. This guide explains what to capture and how to structure it so you get full observability into transcripts, agent behavior, function calls, errors, and latency.

The Core Idea

The Agent API runs over a single WebSocket connection (wss://agent.deepgram.com/v1/agent/converse). All control and event traffic moves across that socket as JSON messages in both directions. There is no separate logging API, so the recommended pattern is to tap the WebSocket and persist every non-audio frame, in both directions, keyed to the connection’s request_id.

That gives you a complete, replayable record of each session that you can join back to the high-level usage in the dashboard.

What to Capture

Connection and Metadata

  • request_id from the Welcome message. This is the unique ID for the session and the key you use to correlate your logs with dashboard usage. The server sends Welcome as soon as the socket opens, so capture it first.
  • The full Settings message you send. This is your record of how the agent was configured: the listen / think / speak providers and models, the prompt, the functions, context_length, the tags array (tags also flow into usage records, so they become your filtering dimension), and flags such as history, experimental, and mip_opt_out.
  • SettingsApplied confirmation from the server. See Settings Applied.

Server Events to Store

MessageWhy It Matters
ConversationText (role, content)The transcript of every user and agent turn. The core record. Includes languages / languages_hinted when using flux-general-multi.
AgentThinking (content)What the agent was processing before it responded.
FunctionCallRequestTool-call requests with id, name, arguments, and the client_side flag. See Function Calling for details.
LatencyReportDetailed STT / LLM / TTS latency breakdown. Requires experimental: true. See section below.
UserStartedSpeaking / AgentAudioDoneTurn boundaries and barge-in timing.
Error / Warning (code, description)Failure and degradation tracking.
Acknowledgements (ListenUpdated, ThinkUpdated, SpeakUpdated, PromptUpdated)Confirms the exact moment an Update* message was applied. If an acknowledgement is missing, the update may not have landed — useful for troubleshooting mid-call configuration changes.
HistoryFull conversation plus function-call records, useful for session reconstruction and resume. Requires flags.history: true (default on).

For a complete reference of all server events, see Outputs: Server Events.

Client Messages to Store

For a complete reference of all client messages, see Inputs: Client Messages.

Skip the raw binary audio frames unless you specifically need call recordings. If you do, store them separately.

LatencyReport: Detailed Latency Telemetry

Set experimental: true at the top level of your Settings message and the server will emit a LatencyReport event. This is the richest latency signal available. Capture it the same way as every other frame.

It breaks latency down across the full STT → LLM → TTS pipeline. All fields are floats in seconds, and each is optional (omitted when not applicable to that turn):

FieldWhat It Measures
stt_latencySpeech-to-text: audio received to transcript produced
ttt_token_latencyTime to first token of any type (text, tool call, or thinking)
ttt_text_latencyTime to first text token from the LLM
ttt_tool_call_latencyTime to first tool-call token from the LLM
ttt_thinking_latencyTime to first thinking token from the LLM
tts_latencyText-to-speech: first text token to first audio byte
total_latencyEnd-to-end: user utterance end to first audio byte

This lets you attribute latency to the right stage — for example, separating LLM time-to-first-token from TTS time, and isolating tool-call and thinking overhead.

LatencyReport is gated behind experimental: true, so treat the schema as subject to change. Because the fields are optional, log defensively rather than assuming every field is present on every report.

Wrap every captured frame in your own envelope. Most Agent messages do not carry their own timestamps, so stamp them yourself on send or receive.

1{
2 "request_id": "fc553ec9-...",
3 "session_id": "your-internal-id",
4 "customer_id": "...",
5 "seq": 42,
6 "ts": "2025-06-26T15:04:05.123Z",
7 "direction": "server_to_client",
8 "payload": { /* the full original JSON message, including its "type" field */ }
9}
  • request_id: from the Welcome message — the correlation key for joining your logs with dashboard usage.
  • seq: a monotonic counter you assign for ordering.
  • ts: wall-clock time when you sent or received the frame.
  • direction: server_to_client or client_to_server.

Write these append-only, one record per frame (JSONL per session, or a table keyed on request_id + seq). From that store you can:

Things to Keep in Mind

  • Timestamps are yours to add. The protocol does not stamp messages, so record wall-clock time on send and receive, and assign a monotonic sequence number for ordering.
  • Keep flags.history enabled if you want History events. It is on by default.
  • Set experimental: true to receive LatencyReport, and treat that schema as subject to change.
  • Set mip_opt_out in Settings if the data should not be used for model improvement.