# Self-Healing Home Server & Infrastructure Management Running a home server means being on-call 24/7 for your own infrastructure. Services go down at 3 AM, certificates expire silently, disk fills up, and pods crash-loop — all while you're asleep or away. This use case turns OpenClaw into a persistent infrastructure agent with SSH access, automated cron jobs, and the ability to detect, diagnose, and fix issues before you know there's a problem. ## Pain Point Home lab operators and self-hosters face a constant maintenance burden: - Health checks, log monitoring, and alerting require manual setup and attention - When something breaks, you have to SSH in, diagnose, and fix — often from your phone - Infrastructure-as-code (Terraform, Ansible, Kubernetes manifests) needs regular updates - Knowledge about your setup lives in your head, not in searchable documentation - Routine tasks (email triage, deployment checks, security audits) eat hours every week ## What It Does - **Automated health monitoring**: Cron-based checks on services, deployments, and system resources - **Self-healing**: Detects issues via health checks and applies fixes autonomously (restart pods, scale resources, fix configs) - **Infrastructure management**: Writes and applies Terraform, Ansible, and Kubernetes manifests - **Morning briefings**: Daily summary of system health, calendar, weather, and task board status - **Email triage**: Scans inbox, labels actionable items, archives noise - **Knowledge extraction**: Processes notes and conversation exports into a structured, searchable knowledge base - **Blog publishing pipeline**: Draft → generate banner → publish to CMS → deploy to hosting — fully automated - **Security auditing**: Regular scans for hardcoded secrets, privileged containers, and overly permissive access ## Skills You Need - `ssh` access to home network machines - `kubectl` for Kubernetes cluster management - `terraform` and `ansible` for infrastructure-as-code - `1password` CLI for secrets management - `gog` CLI for email access - Calendar API access - Obsidian vault or notes directory (for knowledge base) - `openclaw doctor` for self-diagnostics ## How to Set It Up ### 1. Core Agent Configuration Name your agent and define its access scope in AGENTS.md: ```text ## Infrastructure Agent You are Reef, an infrastructure management agent. Access: - SSH to all machines on the home network (192.168.1.0/24) - kubectl for the K3s cluster - 1Password vault (read-only for credentials, dedicated AI vault) - Gmail via gog CLI - Calendar (yours + partner's) - Obsidian vault at ~/Documents/Obsidian/ Rules: - NEVER hardcode secrets — always use 1Password CLI or environment variables - NEVER push directly to main — always create a PR - Run `openclaw doctor` as part of self-health checks - Log all infrastructure changes to ~/logs/infra-changes.md ``` ### 2. Automated Cron Job System The power of this setup is the scheduled job system. Configure in HEARTBEAT.md: ```text ## Cron Schedule Every 15 minutes: - Check kanban board for in-progress tasks → continue work Every hour: - Monitor health checks (Gatus, ArgoCD, service endpoints) - Triage Gmail (label actionable items, archive noise) - Check for unanswered alerts or notifications Every 6 hours: - Knowledge base data entry (process new Obsidian notes) - Self health check (openclaw doctor, disk usage, memory, logs) Every 12 hours: - Code quality and documentation audit - Log analysis via Loki/monitoring stack Daily: - 4:00 AM: Nightly brainstorm (explore connections between notes) - 8:00 AM: Morning briefing (weather, calendars, system stats, task board) - 1:00 AM: Velocity assessment (process improvements) Weekly: - Knowledge base QA review - Infrastructure security audit ``` ### 3. Security Setup (Critical) This is non-negotiable. Before giving your agent SSH access: ```text ## Security Checklist 1. Pre-push hooks: - Install TruffleHog or similar secret scanner on ALL repositories - Block any commit containing hardcoded API keys, tokens, or passwords 2. Local-first Git workflow: - Use Gitea (self-hosted) for private code before pushing to public GitHub - CI scanning pipeline (Woodpecker or similar) runs before any public push - Human review required before main branch merges 3. Defense in depth: - Dedicated 1Password vault for AI agent (limited scope) - Network segmentation for sensitive services - Daily automated security audits checking for: * Privileged containers * Hardcoded secrets in code or configs * Overly permissive file/network access * Known vulnerabilities in deployed images 4. Agent constraints: - Branch protection: PR required for main, agent cannot override - Read-only access where write isn't needed - All changes logged and auditable via git ``` ### 4. Morning Briefing Template ```text ## Daily Briefing Format Generate and deliver at 8:00 AM: ### Weather - Current conditions and forecast for [your location] ### Calendars - Your events today - Partner's events today - Conflicts or overlaps flagged ### System Health - CPU / RAM / Storage across all machines - Services: UP/DOWN status - Recent deployments (ArgoCD) - Any alerts in last 24h ### Task Board - Cards completed yesterday - Cards in progress - Blocked items needing attention ### Highlights - Notable items from nightly brainstorm - Emails requiring action - Upcoming deadlines this week ``` ## Key Insights - **"I can't believe I have a self-healing server now"**: The agent can run SSH, Terraform, Ansible, and kubectl commands to fix infrastructure issues before you even know there's a problem - **AI will hardcode secrets**: This is the #1 security risk. The agent will happily put an API key inline in code if you don't enforce guardrails. Pre-push hooks and secret scanning are mandatory - **Local-first Git is essential**: Never let the agent push directly to public repositories. Use a private Gitea instance as a staging area with CI scanning - **Cron jobs are the real product**: The scheduled automation (health checks, email triage, briefings) provides more daily value than ad-hoc commands - **Knowledge extraction compounds**: Processing notes, conversation exports, and emails into a structured knowledge base gets more valuable over time — one user extracted 49,079 atomic facts from their ChatGPT history alone ## Inspired By This use case is based on Nathan's detailed writeup ["Everything I've Done with OpenClaw (So Far)"](https://madebynathan.com/2026/02/03/everything-ive-done-with-openclaw-so-far/), where he describes his OpenClaw agent "Reef" running on a home server with SSH access to all machines, a Kubernetes cluster, 1Password integration, and an Obsidian vault with 5,000+ notes. Reef runs 15 active cron jobs, 24 custom scripts, and has autonomously built and deployed applications including a task management UI. Nathan's hard-won lesson after a Day 1 API key exposure: "AI assistants will happily hardcode secrets. They sometimes don't have the same instincts humans do." His defense-in-depth security setup (TruffleHog pre-push hooks, local Gitea, CI scanning, daily audits) is essential reading for anyone attempting this pattern. Also referenced on the [OpenClaw Showcase](https://openclaw.ai/showcase), where `@georgedagg_` described a similar pattern: deployment monitoring, log review, configuration fixes, and PR submissions — all while walking the dog. ## Related Links - [Nathan's Full Writeup](https://madebynathan.com/2026/02/03/everything-ive-done-with-openclaw-so-far/) - [OpenClaw Documentation](https://github.com/openclaw/openclaw) - [TruffleHog (Secret Scanning)](https://github.com/trufflesecurity/trufflehog) - [K3s (Lightweight Kubernetes)](https://k3s.io/) - [Gitea (Self-hosted Git)](https://gitea.io/) - [n8n (Workflow Automation)](https://n8n.io/)