Hermes Agent · Optional

simpo-training

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

MlopsOptionalv1.0.0MIT

What this skill is

This directory page tracks a Hermes-compatible skill reference and links back to the original source for install instructions, files, and updates.

Tags and platforms

Post-TrainingSimPOPreference OptimizationAlignmentDPO AlternativeReference-FreeLLM AlignmentEfficient Training

Featured

Your product here

Show your offer to OpenClaw operators and AI builders across every page and blog.

Advertise