Hermes Agent · Optional

tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

MlopsOptionalv1.0.0MIT

What this skill is

This directory page tracks a Hermes-compatible skill reference and links back to the original source for install instructions, files, and updates.

Tags and platforms

Inference ServingTensorRT-LLMNVIDIAInference OptimizationHigh ThroughputLow LatencyProductionFP8INT4In-Flight BatchingMulti-GPU

Related Hermes skills

Built-in

audiocraft-audio-generation

PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation.

Built-in

axolotl

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

Built-in

dspy

Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming

Built-in

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.