Stream analysis

Stream analysis from a live camera: connect to Interhuman over WebSocket, capture media, and send binary chunks as they are recorded. In this codealong, you will:

Connect to wss://api.interhuman.ai/v1/stream/analyze
Get camera (and microphone where your stack supports it)
Send each recorded segment to Interhuman and read typed server events

Wire the three steps together in your app. Use JavaScript in the browser (getUserMedia + MediaRecorder) or Python on the desktop (opencv-python for video; see notes below for audio).

You’ll need an API key. Follow the API key guide for details.

1) Connect to the WebSocket

Open a TLS WebSocket to the stream endpoint. On connect, send session config as a text frame (UTF-8 JSON), then listen for text replies and parse JSON. Branch on type (signal.detected, engagement.updated, conversation_quality.updated, error).

const WS_URL = "wss://api.interhuman.ai/v1/stream/analyze";
const apiKey = "YOUR_API_KEY"; // In production, do not hardcode—use your app's auth flow.

const ws = new WebSocket(WS_URL, apiKey);
ws.binaryType = "arraybuffer";

ws.addEventListener("open", () => {
  const sessionConfig = {
    include: [
      "conversation_quality_overall",
      "conversation_quality_timeline",
    ],
  };
  ws.send(JSON.stringify(sessionConfig));
});

ws.addEventListener("message", (event) => {
  if (typeof event.data !== "string") return;
  const payload = JSON.parse(event.data);
  console.log(payload.type, payload);
});

Reference: Stream & analyze

2) Get camera and microphone

const mediaStream = await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true,
});

const preview = document.querySelector("#preview");
preview.srcObject = mediaStream;
preview.play();

3) Send segments to Interhuman

Send each non-empty recording as a binary WebSocket frame. Start recording after the WebSocket is open and session config is sent.

const SEGMENT_MS = 3000;

const mimeType =
  ["video/webm;codecs=vp9,opus", "video/webm;codecs=vp8,opus", "video/webm"].find(
    (m) => MediaRecorder.isTypeSupported(m)
  ) || "";

const recorder = new MediaRecorder(
  mediaStream,
  mimeType ? { mimeType } : undefined
);

recorder.addEventListener("dataavailable", async (event) => {
  if (!event.data || event.data.size === 0) return;
  if (ws.readyState !== WebSocket.OPEN) return;

  const buffer = await event.data.arrayBuffer();
  ws.send(buffer);
});

// Call once the WebSocket is open and session config is sent:
recorder.start(SEGMENT_MS);

When the user stops, release the camera and close the connection:

if (recorder && recorder.state !== "inactive") {
  recorder.stop();
}
ws.close();
mediaStream.getTracks().forEach((track) => track.stop());

4) Read server envelopes

Every server message shares the same outer shape: type, timestamp, correlation_id, and data. Narrow on type before reading fields inside data.

`signal.detected`

{
  "type": "signal.detected",
  "timestamp": "2025-01-01T00:00:00.000000Z",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "data": {
    "signals": [
      {
        "type": "agreement",
        "start": 3.0,
        "end": 11.0,
        "probability": "high",
        "rationale": "Subject nodded repeatedly while maintaining eye contact."
      }
    ]
  }
}

Each entry in data.signals[] uses the same shape as upload responses: type, start, end, probability, and rationale.

`engagement.updated`

{
  "type": "engagement.updated",
  "timestamp": "2025-01-01T00:00:00.000000Z",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "data": {
    "state": "engaged",
    "start": 3.0,
    "end": 11.0
  }
}

`conversation_quality.updated` (when opted in)

When your session config include lists conversation_quality_overall and/or conversation_quality_timeline, you may receive conversation_quality.updated with data.overall and/or data.timeline for the window that was just processed. See Conversation quality.

`error`

Errors use the same envelope with type: "error" and structured fields under data (for example code, message, link, and segment when applicable). See Error handling.

How to interpret it quickly

signal.detected: data.signals[] lists moment-level social signals for the segment; each rationale explains that detection.
engagement.updated: attention level for a time window within the segment (start / end are seconds within that segment).
conversation_quality.updated: optional overall and per-window quality metrics when requested.

Next steps

Stream & analyze — full AsyncAPI channel, headers, and message schemas.
Authentication — API key usage; browser WebSockets use the subprotocol as above.
Error handling — structured error codes and recovery.
Video upload quickstart — one-shot POST /v1/upload/analyze flow.
Agent Skills — installable skills that wrap upload and stream calls.
Social signals and Conversation quality — meaning of outputs.

Getting Started

How To

Explanations

1) Connect to the WebSocket

2) Get camera and microphone

3) Send segments to Interhuman

4) Read server envelopes

`signal.detected`

`engagement.updated`

`conversation_quality.updated` (when opted in)

`error`

How to interpret it quickly

Next steps

Getting Started

How To

Explanations

Documentation Index

​1) Connect to the WebSocket

​2) Get camera and microphone

​3) Send segments to Interhuman

​4) Read server envelopes

​signal.detected

​engagement.updated

​conversation_quality.updated (when opted in)

​error

​How to interpret it quickly

​Next steps

1) Connect to the WebSocket

2) Get camera and microphone

3) Send segments to Interhuman

4) Read server envelopes

`signal.detected`

`engagement.updated`

`conversation_quality.updated` (when opted in)

`error`

How to interpret it quickly

Next steps