Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.interhuman.ai/llms.txt

Use this file to discover all available pages before exploring further.

Stream analysis from a live camera: connect to Interhuman over WebSocket, capture media, and send binary chunks as they are recorded. In this codealong, you will:
  1. Connect to wss://api.interhuman.ai/v1/stream/analyze
  2. Get camera (and microphone where your stack supports it)
  3. Send each recorded segment to Interhuman and read typed server events
Wire the three steps together in your app. Use JavaScript in the browser (getUserMedia + MediaRecorder) or Python on the desktop (opencv-python for video; see notes below for audio).
You’ll need an API key. Follow the API key guide for details.

1) Connect to the WebSocket

Open a TLS WebSocket to the stream endpoint. On connect, send session config as a text frame (UTF-8 JSON), then listen for text replies and parse JSON. Branch on type (signal.detected, engagement.updated, conversation_quality.updated, error).
const WS_URL = "wss://api.interhuman.ai/v1/stream/analyze";
const apiKey = "YOUR_API_KEY"; // In production, do not hardcode—use your app's auth flow.

const ws = new WebSocket(WS_URL, apiKey);
ws.binaryType = "arraybuffer";

ws.addEventListener("open", () => {
  const sessionConfig = {
    include: [
      "conversation_quality_overall",
      "conversation_quality_timeline",
    ],
  };
  ws.send(JSON.stringify(sessionConfig));
});

ws.addEventListener("message", (event) => {
  if (typeof event.data !== "string") return;
  const payload = JSON.parse(event.data);
  console.log(payload.type, payload);
});
Reference: Stream & analyze

2) Get camera and microphone

const mediaStream = await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true,
});

const preview = document.querySelector("#preview");
preview.srcObject = mediaStream;
preview.play();

3) Send segments to Interhuman

Send each non-empty recording as a binary WebSocket frame. Start recording after the WebSocket is open and session config is sent.
const SEGMENT_MS = 3000;

const mimeType =
  ["video/webm;codecs=vp9,opus", "video/webm;codecs=vp8,opus", "video/webm"].find(
    (m) => MediaRecorder.isTypeSupported(m)
  ) || "";

const recorder = new MediaRecorder(
  mediaStream,
  mimeType ? { mimeType } : undefined
);

recorder.addEventListener("dataavailable", async (event) => {
  if (!event.data || event.data.size === 0) return;
  if (ws.readyState !== WebSocket.OPEN) return;

  const buffer = await event.data.arrayBuffer();
  ws.send(buffer);
});

// Call once the WebSocket is open and session config is sent:
recorder.start(SEGMENT_MS);
When the user stops, release the camera and close the connection:
if (recorder && recorder.state !== "inactive") {
  recorder.stop();
}
ws.close();
mediaStream.getTracks().forEach((track) => track.stop());

4) Read server envelopes

Every server message shares the same outer shape: type, timestamp, correlation_id, and data. Narrow on type before reading fields inside data.

signal.detected

{
  "type": "signal.detected",
  "timestamp": "2025-01-01T00:00:00.000000Z",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "data": {
    "signals": [
      {
        "type": "agreement",
        "start": 3.0,
        "end": 11.0,
        "probability": "high",
        "rationale": "Subject nodded repeatedly while maintaining eye contact."
      }
    ]
  }
}
Each entry in data.signals[] uses the same shape as upload responses: type, start, end, probability, and rationale.

engagement.updated

{
  "type": "engagement.updated",
  "timestamp": "2025-01-01T00:00:00.000000Z",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000",
  "data": {
    "state": "engaged",
    "start": 3.0,
    "end": 11.0
  }
}

conversation_quality.updated (when opted in)

When your session config include lists conversation_quality_overall and/or conversation_quality_timeline, you may receive conversation_quality.updated with data.overall and/or data.timeline for the window that was just processed. See Conversation quality.

error

Errors use the same envelope with type: "error" and structured fields under data (for example code, message, link, and segment when applicable). See Error handling.

How to interpret it quickly

  • signal.detected: data.signals[] lists moment-level social signals for the segment; each rationale explains that detection.
  • engagement.updated: attention level for a time window within the segment (start / end are seconds within that segment).
  • conversation_quality.updated: optional overall and per-window quality metrics when requested.

Next steps