Hermes Agent · Optional

distributed-llm-pretraining-torchtitan

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

MlopsOptionalv1.0.0MIT

What this skill is

This directory page tracks a Hermes-compatible skill reference and links back to the original source for install instructions, files, and updates.

Tags and platforms

Model ArchitectureDistributed TrainingTorchTitanFSDP2Tensor ParallelPipeline ParallelContext ParallelFloat8LlamaPretraining

Featured

Your product here

Show your offer to OpenClaw operators and AI builders across every page and blog.

Advertise