Hermes Agent · Optional

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

MlopsOptionalv1.0.0MIT

What this skill is

This directory page tracks a Hermes-compatible skill reference and links back to the original source for install instructions, files, and updates.

Tags and platforms

LLaVAVision-LanguageMultimodalVisual Question AnsweringImage ChatCLIPVicunaConversational AIInstruction TuningVQA

Featured

Your product here

Show your offer to OpenClaw operators and AI builders across every page and blog.

Advertise