<img src="docs/logo.svg" width="30" align="top" alt=""> HomeLab Monitor

![GitHub stars](https://github.com/SikamikanikoBG/homelab-monitor/stargazers) ![Docker pulls](https://hub.docker.com/r/sikamikaniko123/homelab-monitor) ![Discord](https://discord.gg/tpKWKEdSQN) ![version](CHANGELOG.md) !license !docker ![docs](https://sikamikanikobg.github.io/homelab-monitor/)

One page for your whole home lab & AI rig — GPU truth (any vendor), tokens/sec, power cost by the hour, uptime, training runs, containers, disks. No agents, no separate metrics stack, no cloud.

▶ Watch the 1-minute tour on YouTube

Your home lab grew into a couple of machines, a Pi, and a GPU that's mysteriously always busy — and lately it's running models too. HomeLab Monitor gives you one self-hosted page that answers the real questions: what's that GPU actually doing, which model is holding it, what's it costing you to run, which container is eating RAM, what's filling your disks, and is anything down — across every box over SSH: Linux, a Pi, even Windows. Readable from your phone over the VPN.

Get started

# Grab the compose file and go. No GPU required — the GPU panels just light up when one's present.
curl -fsSLO https://raw.githubusercontent.com/SikamikanikoBG/homelab-monitor/main/docker-compose.yml
docker compose up -d

Open http://<your-host>:9800 and you're done. Full options (from source, GPU toolkit, Windows/WSL2) → Install docs.

🆕 v0.21 — see your power bill by the hour, and which GPU is burning it. A 7×24 busy-hours cost heatmap that shows when your lab actually costs you money, any-vendor GPU support (AMD on Linux with no ROCm; AMD/Intel on Windows — no vendor tools), and built-in uptime monitoring for any HTTP/TCP endpoint. Release notes · changelog.

What you get

!The Overview — a mission-control cockpit: every host in the fleet at a glance, GPU/CPU/RAM gauges for any box (or the whole homelab), live power-to-money costs and an insight feed

One page, every box, the questions you actually have. The classics are all here — and a whole AI cockpit builds on top of them.

Your GPU, demystified — and honest about it. A card pinned at "100% util" can still be throttling, memory-bandwidth-bound, or quietly drooping its clocks. The GPU tab decodes nvidia-smi's throttle reasons (a red banner the moment it's power-capped or too hot), and shows memory-bandwidth util, core/mem clocks, power-vs-limit and p-state — plus which container is holding the card. And it's no longer NVIDIA-only: AMD GPUs are read on Linux straight from the kernel's amdgpu interface (no ROCm), and AMD and Intel GPUs on Windows hosts — so your card shows up with its name, utilisation and VRAM, no vendor tools required.

!The GPU tab — throttle reasons, memory-bandwidth, clocks and power headroom

What it costs — down to the process. Power becomes money: per machine, then per component (GPU measured via nvidia-smi, CPU/DRAM via RAPL), then per process, container or model — click any row to see what it drew and what it cost over any window. Day & night tariffs (Economy 7, Heures Creuses, …), or just pick your country for a sensible estimate. Every watt is measured or a baseline you set; wall power is never guessed. And a busy-hours heatmap turns months of samples into one picture of when your lab costs you money — a 7×24 day-of-week × hour grid that shows which hour of the week is priciest at a glance.

!The Costs page — per-component and per-process power & money

Your training runs, priced. Push a run from Jupyter, Colab or Kaggle with a one-file client (or mirror it from MLflow), and it comes back with the loss curve and the real GPU energy it burned, on the same timeline. Create, name, expire and revoke API keys yourself.

!A run pushed from a notebook — its loss curve and the GPU power it actually used

And the rest of the lab, the way it always was:

Containers, honestly — health plus RAM and VRAM in separate columns (real resident RAM, not page cache), and click one to tail its logs in a side drawer.
systemd services — local or remote, your own units highlighted, failures first.
WizTree-style disk treemaps, network I/O with per-container top talkers, and a mini-htop for who's eating CPU and RAM.
Multi-machine over SSH — paste one key per box; Linux, a Pi, even Windows. No agents, no installs.
Uptime monitoring, in the box — watch any HTTP endpoint or TCP port (your services, a NAS, a remote site) straight from the container: heartbeat strip, 24h/7d uptime %, latency, and smart per-check alerts — anti-flap confirm, recovery with downtime, and an optional slow-response warning. No extra uptime service to self-host — it's already in the box.
Push alerts — Discord, ntfy.sh and Telegram, edge-triggered so they don't spam.

Full tab-by-tab tour → Features.

Multi-machine, in two sentences

Open the Hosts tab, paste the hub's auto-generated SSH key onto each remote, and the hub starts polling it — no agents, just SSH + Python 3 (PowerShell on Windows). The hub pipes a small self-contained probe over SSH; nothing persists on the remote.

Onboarding, Windows setup, and the security model → Multi-machine docs.

Configuration

Set these under environment: in docker-compose.yml (all optional):

| Variable | Default | Meaning | |---|---|---| | SAMPLE_INTERVAL | 10 | Seconds between samples | | RETENTION_DAYS | 180 | How long history is kept | | PRESSURE_FREE_MB | 2048 | Free VRAM below this counts as "pressure" | | PORT | 9800 | Dashboard port | | MCP_PORT | 9810 | Port for the built-in read-only MCP server | | ENABLE_MCP | 1 | Set 0 to run the dashboard without the MCP server | | WATCH_CONTAINERS | — | Extra containers to scan for OOM (comma-separated) | | WATCH_SERVICES | — | systemd units to always show, even vendor ones (comma-separated) | | CHECK_UPDATES | true | Set false to disable the daily GitHub-releases check (no outbound calls) |

History lives in ./data/gpu.db (a bind mount), so it survives restarts and upgrades. Alerts, the systemd D-Bus mount, and per-server tuning → Configuration docs.

Under the hood

The hub stitches nvidia-smi (plus AMD GPUs via the in-kernel amdgpu sysfs interface, and AMD/Intel on Windows hosts via the built-in GPU perf counters), the Docker API, model-server APIs (Ollama, vLLM, llama.cpp, A1111, …), systemd D-Bus, and /proc + /sys into one sampled view, persisted to SQLite and downsampled on read so a six-month range loads as fast as the last hour. Single page, vendored Chart.js, no build step.

30+ recognised model servers → Model servers
Standard /metrics endpoint to scrape into whatever dashboards you already run → Metrics export
The full data pipeline + caller attribution → How it works

Connect an AI agent (MCP)

Your homelab is now legible to AI agents — point a client at one URL and it can see every host, container, GPU and disk. Read-only, no extra setup.

HomeLab Monitor isn't just a dashboard for you anymore; it's context for your AI agent too. A read-only MCP server is built into the same container (served on :9810) — so Claude, Claude Code, or any MCP client connects in one line and explores your whole lab through 12 named tools, with the same coverage you see on the dashboard: hosts, containers, systemd services, GPU and who's driving it, per-process RAM, AI model servers, disk treemaps, history and alerts.

Connect any MCP client — Claude, ChatGPT, or an agent on your own local Ollama models — and it reads your homelab's live state. Read-only: both directions are just question and answer.

# the dashboard is on :9800; the MCP server rides along on :9810
claude mcp add --transport http homelab http://YOUR-HUB:9810/mcp

Once connected, skip the tab-hunting and just ask — the agent picks the right tools:

"My GPU's been pinned for an hour — which model server is loaded, and who's actually calling it?"
"What's eating /backup? Give me the biggest folders and flag anything that looks like runaway logs."
"Which host is lowest on RAM right now, and what's the top process holding it?"
"I want to reboot and run an OS upgrade this weekend — which box needs it most, and what's a safe order given what's running on each?"

Read-only by design — there are no write tools, so an agent can look but never touch your fleet. Turn it off anytime with ENABLE_MCP=0. Full tool list & setup → MCP docs.

Security

This is a host monitor: it runs with host access and a read-only Docker socket, root mount, and D-Bus socket — a broad footprint by design. Keep it behind your LAN/VPN/firewall and don't expose it to the public internet. Details → docs.

⭐ Support the project

If HomeLab Monitor saves you a browser tab or two, a ⭐ on GitHub genuinely helps other home-labbers find it. Thank you!

💬 Community

Building this is more fun together. Join the HomeLab Monitor Discord — say hi, show off your rig, swap ideas, ask for help, or just hang out. It's where the roadmap chatter, “should we build X?” questions, and quick help happen — and where new contributors get a warm welcome.

![Join the Discord](https://discord.gg/tpKWKEdSQN)

Bring a friend, post an idea, open an issue — let's grow a friendly, healthy homelab community. 💛

Contributing

Issues and PRs are very welcome — especially new model-server probes, new monitors, and GPU back-ends. This is a hobby tool meant to help fellow home-labbers, so be kind. See CONTRIBUTING.md.

License

MIT — see LICENSE.

HomeLab Monitor