Remote OpenClaw
Menu
SkillsMarketplaceGuideAgentsAdvertise
Remote OpenClaw
SkillsMarketplaceGuideAgentsAdvertise
Skills/starchild-ai-agent/official-skills/video-analysis

video-analysis

starchild-ai-agent/official-skills
589 installs17 stars

Installation

npx skills add https://github.com/starchild-ai-agent/official-skills --skill video-analysis

Summary

|

SKILL.md

Video Analysis

Analyze video files using either native model understanding or frame extraction + transcription.

How It Works

analyze_video(path, question)
      │
      ├─ file_size ≤ threshold (default 20MB)
      │     → Send video to a supports_video model (default Gemini 3.1 Flash Lite)
      │     → Model sees full video natively (best quality)
      │
      └─ file_size > threshold
            → ffmpeg extracts keyframes (scene detection for long videos)
            → Whisper transcribes audio track
            → Returns frame image paths + transcript text
            → Agent feeds these to the current chat model

Quick Start

⚠️ Invocation — do NOT use dotted imports. The directory name contains a hyphen (video-analysis), so from skills.video-analysis.exports import ... is a Python syntax error (- is parsed as minus). This is true for every hyphenated skill, not just this one. Use one of the two patterns below.

Pattern A — from workspace root (recommended for scripts):

cd /data/workspace/skills/video-analysis && \
  python3 -c "from exports import analyze_video; \
    import json; \
    print(json.dumps(analyze_video('output/videos/clip.mp4', \
      question='What happens in this video?'), ensure_ascii=False))"

Note: pass the video path workspace-relative (analyze.py resolves it against WORKSPACE_DIR), even though you cd into the skill dir.

Pattern B — inside a starchild-clawd script:

from core.skill_tools import video_analysis
result = video_analysis.analyze_video("output/videos/clip.mp4",
                                      question="What happens in this video?")

❌ Do NOT exec(open('skills/video-analysis/analyze.py').read()) — analyze.py uses __file__ at import time, which is undefined under exec, so it crashes. Load it by file path with importlib.util.spec_from_file_location if you must avoid both patterns above.

# result keys (same for both patterns):
# Analyze a video — auto-selects native or extraction mode
# result = analyze_video("output/videos/clip.mp4", question="What happens in this video?")

# result keys:
#   success: bool
#   mode: "native" | "extraction"
#
# If mode == "native":
#   analysis: str (model's text response)
#   model: str (which model was used)
#   tokens: {input, output, video, audio}
#
# If mode == "extraction":
#   frame_paths: list[str] (workspace-relative paths to keyframe JPEGs)
#   transcript: str | None (Whisper transcription text)
#   frame_count: int
#   duration_sec: float

Using the Exports

from core.skill_tools import video_analysis

# Full analysis (auto-selects mode)
result = video_analysis.analyze_video("output/videos/my_video.mp4", question="Describe this video")

# Check current config
config = video_analysis.get_config()

# Get video metadata without analyzing
info = video_analysis.get_video_info("output/videos/my_video.mp4")
# → {"duration": 45.2, "size": 12345678, "width": 1920, "height": 1080, "has_audio": true}

Native Mode (small videos)

For videos under the size threshold, the skill sends the full video to a model that supports native video input. The model sees every frame and hears the audio.

Default model: google/gemini-3.1-flash-lite — best price/quality for video.

Model benchmark (6MB clip, vs gemini-3.1-pro-preview baseline):

ModelTierCostTimeAccuracyNotes
google/gemini-3.1-flash-litebudget~$0.00148.1s~88%⭐ Default — cheapest + fastest
google/gemini-3.5-flashstd~$0.015211.8s~85%More detail, higher cost
qwen/qwen3.6-plusbudget~$0.005844.2s~95%Accurate but slow
qwen/qwen3.6-flashbudget~$0.002716.6s~80%Misreads subjects sometimes
google/gemini-3.1-pro-previewstd~$0.019919.7s100%Baseline (best, most expensive)

flash-lite identifies the full scene, action sequence, and transitions correctly at ~14x lower cost than the Pro baseline. For maximum accuracy (exact character names, fine detail), switch default_model to gemini-3.1-pro-preview or gemini-3.5-flash in config/video-analysis.yaml.

Extraction Mode (large videos)

For videos over the size threshold, the skill extracts keyframes and transcribes audio:

  • Short videos (≤60s): One frame every N seconds (default: 2s)
  • Long videos (>60s): Scene-change detection picks visually distinct frames
  • Audio: Extracted and sent to Whisper for transcription
  • Max frames: Capped at 30 (configurable) to control cost

The agent receives frame image paths and transcript text, then feeds them to the current chat model as image attachments + context text.

Configuration

Edit config/video-analysis.yaml (in the workspace) to customize. This file is created automatically on first use, only needs the keys you want to override, and survives skill updates.

Do NOT edit skills/video-analysis/config.yaml — that's the factory default and is overwritten on every skill auto-update. The user file overlays it.

Both the standalone skill and the chat "send a video" flow read this same config, so one edit changes the model everywhere. Available keys:

# Model for native video understanding
default_model: google/gemini-3.1-flash-lite

# Size threshold: native (≤) vs extraction (>)
# Set to 0 → always extraction. Set to 100 → always native.
native_size_limit_mb: 20

# Frame extraction settings
extraction:
  max_frames: 30                  # Max keyframes to extract
  short_video_interval_sec: 2     # Frame interval for ≤60s videos
  scene_threshold: 0.3            # Scene detection sensitivity (0.0-1.0)
  transcribe_audio: true          # Whether to Whisper-transcribe audio

Available Video Models

ModelAliasTierNotes
google/gemini-3.1-flash-liteflash31budget⭐ Default, best price/quality
google/gemini-3.5-flashgemini35standardMore detail, higher cost
google/gemini-3.1-flash-liteflash31budgetCheapest option
google/gemini-3.1-pro-previewgeministandardHighest quality
qwen/qwen3.6-flashqwenfbudgetGood alternative
qwen/qwen3.6-plusqwenbudget—
minimax/minimax-m3mm3standard—
meta-llama/llama-4-maverickmaverickstandard—
meta-llama/llama-4-scoutscoutbudget—
xiaomi/mimo-v2.5mimostandard—
z-ai/glm-5v-turboglm5vstandard—
minimax/minimax-m2.7mm27budgetAudio-only, no image

Agent Behavior

When the user provides a video file (via upload or file path) and the current chat model does NOT support video:

  1. Call analyze_video(path, question).
  2. If result mode is "native" → return result["analysis"] directly.
  3. If result mode is "extraction" → use result["frame_paths"] as image

references and result["transcript"] as context, then ask the current model to analyze based on the frames + transcript.

When the current model DOES support video, the backend handles it natively via Phase 1 (base64 content block injection) — no need for this skill.

Troubleshooting

ProblemFix
"File not found"Check path is workspace-relative (e.g. output/videos/x.mp4)
Native mode returns errorCheck default_model in config/video-analysis.yaml is valid
No audio transcriptionVideo may have no audio track; check has_audio in result
Too few frames extractedLower scene_threshold in config/video-analysis.yaml (e.g. 0.15)
Too many frames / high costReduce max_frames or raise scene_threshold

Featured

SetupClaw: done-for-you OpenClaw for founders & exec teams logoSetupClaw: done-for-you OpenClaw for founders & exec teams

White-glove OpenClaw for founders and exec teams (4–50+ employees): we install, harden, integrate your tools, and maintain it — secured from day one.

Get it set up for you →
MoltAwards - Agent internet for government contracts + jobs. logoMoltAwards - Agent internet for government contracts + jobs.

MoltAwards is an agent-native social layer for matchawards.com.

Learn more →
CLN.Work — Stop prompting, start hiring AI employees logoCLN.Work — Stop prompting, start hiring AI employees

Turn your Claude agents into a real team — onboard them, assign tasks, and manage them like staff.

Hire AI employees →
Deploy your own AI agent logoDeploy your own AI agent

Launch OpenClaw or Hermes on Hostinger in about 60 seconds, keep your agent live 24/7, earn 20%-40% on your next referral up to $25-$45, and give your friend 20% off.

Launch on Hostinger →
AdvertiseGet your AI tool in front of 67,000+ AI enthusiastsSee placements & pricing →

Categories

Command ExecutionData ExfiltrationPrompt Injection
View on GitHub

Recommended skills

Browse all →

video-edit

agentspace-so/runcomfy-agent-skills

291K installsInstall

image-to-video

agentspace-so/runcomfy-agent-skills

290K installsInstall

ai-video-generation

agentspace-so/runcomfy-agent-skills

234K installsInstall

ai-avatar-video

agentspace-so/runcomfy-agent-skills

232K installsInstall

video-inpainting

agentspace-so/runcomfy-agent-skills

231K installsInstall

video-extend

agentspace-so/runcomfy-agent-skills

230K installsInstall

Browse

Skills by category

Frontend250Git198Data154Testing120Design105Docs103Security96Automation87Backend76Devops37Productivity29Mcp23

Advertise on Remote OpenClaw

Get your AI tool in front of 67,000+ AI enthusiasts a month

See placements & pricing →

Remote OpenClaw

AI agent skills directory, marketplace, and workflow hub for OpenClaw, Hermes Agent, Claude Code, Codex, and MCP-powered operator stacks.

Explore

  • Home
  • Skills Directory
  • Claude Code Skills
  • Codex Skills
  • Marketplace
  • Hermes Ecosystem
  • Agents
  • Guide
  • Learn
  • Blog

More

  • Playbook
  • Free Tools
  • Shipping
  • Contact
  • Terms
  • Privacy
© 2026 Remote OpenClaw