Skip to main content
Interhuman analyzes live video by keeping a WebSocket open, receiving short WebM segments from your client, and returning structured social-intelligence signals as each segment is processed. By the end of this guide, you will:
  • Connect to the Interhuman video streaming API over WebSocket
  • Send continuous video segments from a client
  • Receive structured, segment-by-segment social-intelligence signals in real time
You’ll need an API key. Follow the API key guide for details.You’ll also need a client that can send binary video data, such as a browser MediaRecorder producing WebM segments.

1. Connect to the WebSocket endpoint

Use a persistent WebSocket connection to send segments and receive results as soon as each one is processed. Connect to:
wss://api.interhuman.ai/v0/stream/analyze

Connect to the API

Open a WebSocket and log server messages as they arrive.
const socket = new WebSocket("wss://api.interhuman.ai/v0/stream/analyze");

socket.onopen = () => console.log("WebSocket connected");

socket.onmessage = (event) => {
  const message = JSON.parse(event.data);
  console.log("Server message:", message);
  // handle result in your own app
};

socket.onerror = (error) => console.error("WebSocket error:", error);

socket.onclose = () => console.log("WebSocket closed");

2. Send video segments

Send each message as a binary-encoded WebM segment. The endpoint expects short video chunks recorded by the browser’s MediaRecorder and rejects segments smaller than 10 KB or larger than 20 MB. A common approach:
  • Use MediaRecorder.
  • Set a fixed timeslice (e.g. 5000 ms) so each segment is ~5 seconds.
  • Ensure each chunk is at least 10 KB and no more than 20 MB.
  • Convert each event into an ArrayBuffer and send it as binary.

Send segments from your client

Capture short WebM chunks on the client and push each one over the open WebSocket.
async function startStreaming(socket, stream) {
  const recorder = new MediaRecorder(stream, {
    mimeType: "video/webm;codecs=vp8,opus"
  });

  recorder.ondataavailable = async (event) => {
    const size = event.data?.size ?? 0;
    if (
      event.data &&
      size >= 10_000 && // server minimum (10 KB)
      size <= 20 * 1024 * 1024 && // server maximum (20 MB)
      socket.readyState === WebSocket.OPEN
    ) {
      const buffer = await event.data.arrayBuffer();
      socket.send(buffer);
    }
  };

  recorder.start(5000); // produce ~5s segments

  return recorder;
}

// Example usage:
// const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
// const recorder = await startStreaming(socket, stream);
Each binary WebM chunk becomes one “segment” on the server. The server validates the video, applies the model, and streams back structured results.

3. Receive real-time results

The server sends three message types per segment:
  • {"status": "processing", "segment": <number>} as soon as a chunk is received.
  • A result payload with detected social signals for that segment.
  • {"status": "completed", "segment": <number>} when processing finishes.
On errors, you’ll receive {"status": "error", "segment": <number>, "error": "<message>"}. Signal type values you may receive: Agreement, Confidence, Confusion, Disagreement, Disengagement, Engagement, Frustration, Hesitation, Interest, Skepticism, Stress, Uncertainty.

Result message

{
  "status": "result",
  "segment": 1,
  "data": {
    "signal": [
      {
        "type": "Agreement",
        "start": 0.0,
        "end": 5.0,
        "reasoning": "Subject maintained eye contact and nodded during the prompt.",
        "feedback": "Positive engagement detected.",
        "confidence": "High",
        "intensity": "Strong"
      }
    ]
  }
}
You can read more in the API reference for streaming analysis.

Summary

  • Open a WebSocket connection to the streaming endpoint.
  • Continuously send video segments as binary WebM chunks.
  • Handle processing, result, and completed messages for each segment; inspect data.signal for detected social signals.
  • Configuration (prompts, sampling, thinking) is applied server-side; no query parameters are required.