Hermes Agent · Optional

optimizing-attention-flash

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

MlopsOptionalv1.0.0MIT

What this skill is

This directory page tracks a Hermes-compatible skill reference and links back to the original source for install instructions, files, and updates.

Tags and platforms

OptimizationFlash AttentionAttention OptimizationMemory EfficiencySpeed OptimizationLong ContextPyTorchSDPAH100FP8Transformers

Featured

Your product here

Show your offer to OpenClaw operators and AI builders across every page and blog.

Advertise