Files
awesome-openclaw-usecases/usecases/self-healing-home-server.md
2026-02-11 14:08:44 +01:00

183 lines
7.7 KiB
Markdown

# Self-Healing Home Server & Infrastructure Management
Running a home server means being on-call 24/7 for your own infrastructure. Services go down at 3 AM, certificates expire silently, disk fills up, and pods crash-loop — all while you're asleep or away.
This use case turns OpenClaw into a persistent infrastructure agent with SSH access, automated cron jobs, and the ability to detect, diagnose, and fix issues before you know there's a problem.
## Pain Point
Home lab operators and self-hosters face a constant maintenance burden:
- Health checks, log monitoring, and alerting require manual setup and attention
- When something breaks, you have to SSH in, diagnose, and fix — often from your phone
- Infrastructure-as-code (Terraform, Ansible, Kubernetes manifests) needs regular updates
- Knowledge about your setup lives in your head, not in searchable documentation
- Routine tasks (email triage, deployment checks, security audits) eat hours every week
## What It Does
- **Automated health monitoring**: Cron-based checks on services, deployments, and system resources
- **Self-healing**: Detects issues via health checks and applies fixes autonomously (restart pods, scale resources, fix configs)
- **Infrastructure management**: Writes and applies Terraform, Ansible, and Kubernetes manifests
- **Morning briefings**: Daily summary of system health, calendar, weather, and task board status
- **Email triage**: Scans inbox, labels actionable items, archives noise
- **Knowledge extraction**: Processes notes and conversation exports into a structured, searchable knowledge base
- **Blog publishing pipeline**: Draft → generate banner → publish to CMS → deploy to hosting — fully automated
- **Security auditing**: Regular scans for hardcoded secrets, privileged containers, and overly permissive access
## Skills You Need
- `ssh` access to home network machines
- `kubectl` for Kubernetes cluster management
- `terraform` and `ansible` for infrastructure-as-code
- `1password` CLI for secrets management
- `gog` CLI for email access
- Calendar API access
- Obsidian vault or notes directory (for knowledge base)
- `openclaw doctor` for self-diagnostics
## How to Set It Up
### 1. Core Agent Configuration
Name your agent and define its access scope in AGENTS.md:
```text
## Infrastructure Agent
You are Reef, an infrastructure management agent.
Access:
- SSH to all machines on the home network (192.168.1.0/24)
- kubectl for the K3s cluster
- 1Password vault (read-only for credentials, dedicated AI vault)
- Gmail via gog CLI
- Calendar (yours + partner's)
- Obsidian vault at ~/Documents/Obsidian/
Rules:
- NEVER hardcode secrets — always use 1Password CLI or environment variables
- NEVER push directly to main — always create a PR
- Run `openclaw doctor` as part of self-health checks
- Log all infrastructure changes to ~/logs/infra-changes.md
```
### 2. Automated Cron Job System
The power of this setup is the scheduled job system. Configure in HEARTBEAT.md:
```text
## Cron Schedule
Every 15 minutes:
- Check kanban board for in-progress tasks → continue work
Every hour:
- Monitor health checks (Gatus, ArgoCD, service endpoints)
- Triage Gmail (label actionable items, archive noise)
- Check for unanswered alerts or notifications
Every 6 hours:
- Knowledge base data entry (process new Obsidian notes)
- Self health check (openclaw doctor, disk usage, memory, logs)
Every 12 hours:
- Code quality and documentation audit
- Log analysis via Loki/monitoring stack
Daily:
- 4:00 AM: Nightly brainstorm (explore connections between notes)
- 8:00 AM: Morning briefing (weather, calendars, system stats, task board)
- 1:00 AM: Velocity assessment (process improvements)
Weekly:
- Knowledge base QA review
- Infrastructure security audit
```
### 3. Security Setup (Critical)
This is non-negotiable. Before giving your agent SSH access:
```text
## Security Checklist
1. Pre-push hooks:
- Install TruffleHog or similar secret scanner on ALL repositories
- Block any commit containing hardcoded API keys, tokens, or passwords
2. Local-first Git workflow:
- Use Gitea (self-hosted) for private code before pushing to public GitHub
- CI scanning pipeline (Woodpecker or similar) runs before any public push
- Human review required before main branch merges
3. Defense in depth:
- Dedicated 1Password vault for AI agent (limited scope)
- Network segmentation for sensitive services
- Daily automated security audits checking for:
* Privileged containers
* Hardcoded secrets in code or configs
* Overly permissive file/network access
* Known vulnerabilities in deployed images
4. Agent constraints:
- Branch protection: PR required for main, agent cannot override
- Read-only access where write isn't needed
- All changes logged and auditable via git
```
### 4. Morning Briefing Template
```text
## Daily Briefing Format
Generate and deliver at 8:00 AM:
### Weather
- Current conditions and forecast for [your location]
### Calendars
- Your events today
- Partner's events today
- Conflicts or overlaps flagged
### System Health
- CPU / RAM / Storage across all machines
- Services: UP/DOWN status
- Recent deployments (ArgoCD)
- Any alerts in last 24h
### Task Board
- Cards completed yesterday
- Cards in progress
- Blocked items needing attention
### Highlights
- Notable items from nightly brainstorm
- Emails requiring action
- Upcoming deadlines this week
```
## Key Insights
- **"I can't believe I have a self-healing server now"**: The agent can run SSH, Terraform, Ansible, and kubectl commands to fix infrastructure issues before you even know there's a problem
- **AI will hardcode secrets**: This is the #1 security risk. The agent will happily put an API key inline in code if you don't enforce guardrails. Pre-push hooks and secret scanning are mandatory
- **Local-first Git is essential**: Never let the agent push directly to public repositories. Use a private Gitea instance as a staging area with CI scanning
- **Cron jobs are the real product**: The scheduled automation (health checks, email triage, briefings) provides more daily value than ad-hoc commands
- **Knowledge extraction compounds**: Processing notes, conversation exports, and emails into a structured knowledge base gets more valuable over time — one user extracted 49,079 atomic facts from their ChatGPT history alone
## Inspired By
This use case is based on Nathan's detailed writeup ["Everything I've Done with OpenClaw (So Far)"](https://madebynathan.com/2026/02/03/everything-ive-done-with-openclaw-so-far/), where he describes his OpenClaw agent "Reef" running on a home server with SSH access to all machines, a Kubernetes cluster, 1Password integration, and an Obsidian vault with 5,000+ notes. Reef runs 15 active cron jobs, 24 custom scripts, and has autonomously built and deployed applications including a task management UI. Nathan's hard-won lesson after a Day 1 API key exposure: "AI assistants will happily hardcode secrets. They sometimes don't have the same instincts humans do." His defense-in-depth security setup (TruffleHog pre-push hooks, local Gitea, CI scanning, daily audits) is essential reading for anyone attempting this pattern.
Also referenced on the [OpenClaw Showcase](https://openclaw.ai/showcase), where `@georgedagg_` described a similar pattern: deployment monitoring, log review, configuration fixes, and PR submissions — all while walking the dog.
## Related Links
- [Nathan's Full Writeup](https://madebynathan.com/2026/02/03/everything-ive-done-with-openclaw-so-far/)
- [OpenClaw Documentation](https://github.com/openclaw/openclaw)
- [TruffleHog (Secret Scanning)](https://github.com/trufflesecurity/trufflehog)
- [K3s (Lightweight Kubernetes)](https://k3s.io/)
- [Gitea (Self-hosted Git)](https://gitea.io/)
- [n8n (Workflow Automation)](https://n8n.io/)