Add concise DevOps resources (agents, instructions, prompt) (#1) (#513)

* Initial plan * Add DevOps resources: agents, instructions, and prompt * Replace redundant GitHub Actions instructions with expert agent * Make DevOps resources more generic for easier maintenance * Remove optional model field to align with repository conventions * Reduce code examples to focus on principles and guidance * Add DevOps Expert agent following infinity loop principle --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: benjisho-aidome <218995725+benjisho-aidome@users.noreply.github.com> Co-authored-by: Matt Soucoup <masoucou@microsoft.com>
2026-02-20 02:15:12 +00:00 · 2026-01-09 18:41:01 +02:00
parent e496ef1b9b
commit 57473945b0
9 changed files with 921 additions and 0 deletions
--- a/prompts/devops-rollout-plan.prompt.md
+++ b/prompts/devops-rollout-plan.prompt.md
@@ -0,0 +1,118 @@
+---
+agent: 'agent'
+description: 'Generate comprehensive rollout plans with preflight checks, step-by-step deployment, verification signals, rollback procedures, and communication plans for infrastructure and application changes'
+tools: ['codebase', 'terminalCommand', 'search', 'githubRepo']
+---
+
+# DevOps Rollout Plan Generator
+
+Your goal is to create a comprehensive, production-ready rollout plan for infrastructure or application changes.
+
+## Input Requirements
+
+Gather these details before generating the plan:
+
+### Change Description
+- What's changing (infrastructure, application, configuration)
+- Version or state transition (from/to)
+- Problem solved or feature added
+
+### Environment Details
+- Target environment (dev, staging, production, all)
+- Infrastructure type (Kubernetes, VMs, serverless, containers)
+- Affected services and dependencies
+- Current capacity and scale
+
+### Constraints & Requirements
+- Acceptable downtime window
+- Change window restrictions
+- Approval requirements
+- Regulatory or compliance considerations
+
+### Risk Assessment
+- Blast radius of change
+- Data migrations or schema changes
+- Rollback complexity and safety
+- Known risks
+
+## Output Format
+
+Generate a structured rollout plan with these sections:
+
+### 1. Executive Summary
+- What, why, when, duration
+- Risk level and rollback time
+- Affected systems and user impact
+- Expected downtime
+
+### 2. Prerequisites & Approvals
+- Required approvals (technical lead, security, compliance, business)
+- Required resources (capacity, backups, monitoring, rollback automation)
+- Pre-deployment backups
+
+### 3. Preflight Checks
+- Infrastructure health validation
+- Application health baseline
+- Dependency availability
+- Monitoring baseline metrics
+- Go/no-go decision checklist
+
+### 4. Step-by-Step Rollout Procedure
+**Phases**: Pre-deployment, deployment, progressive verification
+- Specific commands for each step
+- Validation after each step
+- Duration estimates
+
+### 5. Verification Signals
+**Immediate** (0-2 min): Deployment success, pods/containers started, health checks passing
+**Short-term** (2-5 min): Application responding, error rates acceptable, latency normal
+**Medium-term** (5-15 min): Sustained metrics, stable connections, integrations working
+**Long-term** (15+ min): No degradation, capacity healthy, business metrics normal
+
+### 6. Rollback Procedure
+**Decision Criteria**: When to initiate rollback
+**Rollback Steps**: Automated, infrastructure revert, or full restore
+**Post-Rollback Verification**: Confirm system health restored
+**Communication**: Stakeholder notification
+
+### 7. Communication Plan
+- Pre-deployment (T-24h): Schedule and impact notice
+- Deployment start: Commencement notice
+- Progress updates: Status every X minutes
+- Completion: Success confirmation
+- Rollback (if needed): Issue notification
+
+**Stakeholder Matrix**: Who to notify, when, via what method, with what content
+
+### 8. Post-Deployment Tasks
+- Immediate (1h): Verify criteria met, review logs
+- Short-term (24h): Monitor metrics, review errors
+- Medium-term (1 week): Post-deployment review, lessons learned
+
+### 9. Contingency Plans
+Scenarios: Partial failure, performance degradation, data inconsistency, dependency failure
+For each: Symptoms, response, timeline
+
+### 10. Contact Information
+- Primary and secondary on-call
+- Escalation path
+- Emergency contacts (infrastructure, security, database, networking)
+
+## Plan Customization
+
+Adapt based on:
+- **Infrastructure Type**: Kubernetes, VMs, serverless, databases
+- **Risk Level**: Low (simplified), medium (standard), high (additional gates)
+- **Change Type**: Code deployment, infrastructure, configuration, data migration
+- **Environment**: Production (full plan), staging (simplified), development (minimal)
+
+## Remember
+
+- Always have a tested rollback plan
+- Communicate early and often
+- Monitor metrics, not just logs
+- Document everything
+- Learn from each deployment
+- Never deploy on Friday afternoon (unless critical)
+- Never skip verification steps
+- Never assume "it should work"