Skip to main content
Use this quickstart to get your first successful analysis response in a few minutes. In this guide, you will:
  1. Upload a local video file to POST /v1/upload/analyze
  2. Read engagement_state, signals[] (including per-signal rationale), and optional conversation_quality outputs
You’ll need an API key. Follow the API key guide for details. You’ll also need a video file. You can download an example from here.

1) Upload and analyze a video

Use one of the requests below to send a local video file (mp4, avi, mov, mkv, mpeg-ts, webm; minimum 10 KB, maximum 32MB) to POST /v1/upload/analyze.

Video and audio content

Include both video and audio. Black video, silent audio, and muted screen captures may upload successfully but return unreliable results. The API returns core analysis by default. You can optionally request conversation-quality sections by passing include[] flags (shown below).
export API_KEY="YOUR_API_KEY"
export VIDEO_PATH="path_to_your_video.mp4"

curl -X POST https://api.interhuman.ai/v1/upload/analyze \
  -H "Authorization: Bearer ${API_KEY}" \
  -F "file=@${VIDEO_PATH};type=video/mp4" \
  -F "include[]=conversation_quality_overall" \
  -F "include[]=conversation_quality_timeline"
If you want only core outputs, remove the include[] lines. Reference: Upload & Analyze API

2) Read the response

After your upload is processed, the API returns a structured response with three complementary outputs:
  • engagement_state: Time-bounded labels such as engaged, disengaged, or neutral.
  • signals[]: Time-bounded social signals, each with type, probability, and rationale.
  • conversation_quality (optional): Reuses the conversation_quality_values shape in both overall and each timeline window’s values object.
Time fields (start, end) are expressed in seconds from the start of the uploaded video. conversation_quality_values shape (reused by conversation_quality.overall and conversation_quality.timeline[].values):
{
  "quality_index": 45,
  "energy": 53,
  "rapport": 50,
  "authority": 49,
  "learning": 50,
  "clarity": 48
}
Here’s an example of what the API returns:
{
  "engagement_state": [
    {
      "start": 0,
      "end": 10,
      "state": "engaged"
    },
    {
      "start": 10,
      "end": 20,
      "state": "disengaged"
    }
  ],
  "signals": [
    {
      "start": 0,
      "end": 10,
      "type": "agreement",
      "probability": "high",
      "rationale": "The speaker provides a quick affiliative nod while the partner is speaking."
    },
    {
      "start": 5,
      "end": 15,
      "type": "confidence",
      "probability": "medium",
      "rationale": "The speaker maintains upright posture and responds with steady, fluent delivery."
    }
  ],
  "conversation_quality": {
    "overall": {
      "quality_index": 45,
      "energy": 53,
      "rapport": 50,
      "authority": 49,
      "learning": 50,
      "clarity": 48
    },
    "timeline": [
      {
        "start": 0,
        "end": 10,
        "values": {
          "quality_index": 72,
          "energy": 80,
          "rapport": 75,
          "authority": 68,
          "learning": 70,
          "clarity": 67
        }
      }
    ]
  }
}

How to interpret it quickly

  • signals[] gives moment-level events; rationale explains why each signal was inferred.
  • engagement_state shows attention level over contiguous windows.
  • conversation_quality.overall is a single interaction summary.
  • conversation_quality.timeline[] shows how quality changes over time.

Next steps