Hermes Agent · Built-in

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

MlopsBuilt-inv1.0.0MIT

What this skill is

This directory page tracks a Hermes-compatible skill reference and links back to the original source for install instructions, files, and updates.

Tags and platforms

vLLMInference ServingPagedAttentionContinuous BatchingHigh ThroughputProductionOpenAI APIQuantizationTensor Parallelism

Featured

Your product here

Show your offer to OpenClaw operators and AI builders across every page and blog.

Advertise