Remote OpenClaw Blog
Why Your OpenClaw Agent Keeps Breaking (And How to Fix It for Good)
6 min read ·
Remote OpenClaw
Preparing blog content.
Remote OpenClaw Blog
6 min read ·
You set up OpenClaw, it works great for two days, and then it stops responding. You SSH into the server, restart it, it works again. Two days later, same...
This is the most common complaint from operators who deployed OpenClaw without a proper production setup. The good news: every one of these failure patterns is fixable. Here's what's actually happening and how to stop it.
OpenClaw processes exit silently from unhandled promise rejections, OOM kills, or network timeouts — and running the agent with systemd's Restart=always directive is the non-negotiable fix for production deployments. You SSH in and run openclaw status — it's not running. No error, it just stopped.
What's happening: Node processes can exit silently for a lot of reasons — unhandled promise rejections, OOM (out of memory) kills, network timeouts that bubble up to the process level. If you started OpenClaw with openclaw start in a terminal, it dies when that terminal session ends or when the first unhandled error hits.
The fix: Run OpenClaw as a systemd service with Restart=always. This is non-negotiable for production:
sudo nano /etc/systemd/system/openclaw.service
[Unit]
Description=OpenClaw
After=network.target
[Service]
User=YOUR_USERNAME
ExecStart=/usr/bin/openclaw
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable openclaw
sudo systemctl start openclaw
With this in place, if OpenClaw crashes, systemd restarts it within 10 seconds automatically.
OpenClaw's Telegram polling connection can stale out after VPS network interruptions, leaving the process alive but the bot's message stream in a bad state — fixable by switching to webhook mode or adding a 5-minute watchdog cron. Checking logs shows something like ETELEGRAM: 409 Conflict or just silence.
What's happening: Telegram's polling connection can stale out, especially after network interruptions on your VPS. The process is alive but the bot's message stream is in a bad state.
The fix: A few approaches, in order of preference:
First, switch to webhook mode if you have a domain with HTTPS. Webhooks are push-based — Telegram sends messages to your server directly, no persistent polling connection to maintain:
channels:
telegram:
webhook:
enabled: true
url: "https://yourdomain.com/telegram/webhook"
If you're sticking with polling, add a watchdog. Create a simple script that checks if your bot is responding and restarts OpenClaw if not:
#!/bin/bash
# /home/openclaw/watchdog.sh
CHAT_ID="YOUR_TELEGRAM_USER_ID"
BOT_TOKEN="YOUR_BOT_TOKEN"
response=$(curl -s "https://api.telegram.org/bot${BOT_TOKEN}/getUpdates?limit=1&timeout=5")
if echo "$response" | grep -q '"ok":true'; then
echo "Bot OK"
else
echo "Bot not responding, restarting OpenClaw"
sudo systemctl restart openclaw
fi
Run it every 5 minutes via cron: */5 * * * * /home/openclaw/watchdog.sh
OpenClaw's verbose default logging and growing SQLite conversation history can fill a small VPS disk within weeks, requiring log rotation configuration and memory retention limits to prevent server lockups., then the whole VPS becomes unresponsive. df -h shows disk at 100%.
What's happening: OpenClaw is verbose by default. Logs accumulate. Memory (the SQLite-based conversation history) can grow significantly if you're running active workflows. On a small VPS, this eventually becomes a problem.
The fix:
Set up log rotation so logs don't grow unbounded:
sudo nano /etc/logrotate.d/openclaw
/home/openclaw/.openclaw/logs/*.log {
daily
rotate 7
compress
missingok
notifempty
}
In OpenClaw config, tune memory retention:
Best Next Step
Use the marketplace filters to choose the right OpenClaw bundle, persona, or skill for the job you want to automate.
memory:
retention: 90d # Don't keep everything forever
maxTokens: 50000
Set a disk usage alert so you catch it before things blow up:
# Add to crontab
0 8 * * * df -h / | awk 'NR==2{if($5+0>80) print "Disk at "$5}' | mail -s "Disk warning" you@email.com
OpenClaw returns wrong, truncated, or empty responses when the API key hits rate limits, a broken skill causes degraded state fallback, or the context window overflows — all diagnosable with built-in test commands., truncated, or it says things like "I'm unable to help with that" when it should be fine.
What's happening: Usually one of three things — your API key hit a rate limit or ran out of credits, a skill is broken and causing the agent to fall back to a degraded state, or your context window is overflowing.
Diagnosing and fixing:
Check API credit status directly with your provider (Anthropic console or OpenAI dashboard). Set up spending alerts so you know before you hit $0.
Test your provider connection:
openclaw test provider
If a skill is causing issues, disable it and test:
openclaw skills list
openclaw skills disable SKILL_NAME
For context overflow, trim your memory settings or start a fresh session: send /new to your bot.
OpenClaw WhatsApp sessions expire because WhatsApp Web's bridge was designed for human-operated browsers, not persistent server connections — Telegram is more robust for VPS deployments, though keepAlive pings can mitigate the issue. Re-scanning the QR code fixes it temporarily.
What's happening: WhatsApp Web sessions expire, especially when the server is idle or there are long gaps between messages. Unlike Telegram, WhatsApp's web bridge was designed for human-operated browsers, not persistent server connections.
What actually fixes it: This one is harder to fully solve without switching to the WhatsApp Business API (which requires Meta approval). Short-term mitigations:
Keep the session alive with periodic pings:
channels:
whatsapp:
keepAlive:
enabled: true
intervalSeconds: 300
Set up session auto-reconnect in your monitoring (restart OpenClaw when WhatsApp connectivity drops). This doesn't prevent the expiry, but it minimizes downtime.
For operators who need reliable messaging, Telegram is more robust than WhatsApp for VPS deployments. If you're hitting this problem repeatedly, the practical fix is switching channels.
OpenClaw versions prior to the 2026.2.12 release had six known cron scheduler bugs causing skipped jobs, duplicate fires, and timing failures — updating to the latest version resolves all of these issues. (morning briefings, reminders, reports) and they just don't run.
What's happening: Prior to the 2026.2.12 release, there were six known bugs in OpenClaw's cron scheduler — jobs skipping when timing advanced, duplicate fires, and more. If you're on an older version, these bugs are real.
The fix: Update:
npm install -g openclaw@latest
sudo systemctl restart openclaw
openclaw --version # Confirm you're on 2026.x
After updating, test a cron job manually:
openclaw cron test "YOUR_CRON_EXPRESSION"
Remote OpenClaw finds that almost every persistent OpenClaw reliability problem traces back to a single root cause: the agent was set up for demo rather than production, skipping systemd services, log rotation, disk monitoring, and Telegram access restrictions. OpenClaw was set up for demo, not production.
Running it as a daemon instead of a system service. Not hardening the configuration. Installing it as root. Skipping log rotation and disk monitoring. Not restricting Telegram access. Using a VPS plan too small for the workload.
These are all fixable. But they're also all avoidable if the setup is done right the first time.
Want a deployment that does not break? Follow the Security Hardening Guide and grab the free Security Hardener skill. Or use a marketplace persona with pre-configured reliability settings.