chore: publish from staged

2026-06-13 11:33:32 +00:00 · 2026-06-10 04:43:53 +00:00
parent d45fb99396
commit bbf8f7bccd
22 changed files with 3005 additions and 0 deletions
@@ -0,0 +1,194 @@
+---
+name: aws-cost-optimize
+description: 'Analyze AWS resources used in the app (IaC files and/or resources in a target account/region) and optimize costs - creating GitHub issues for identified optimizations.'
+---
+
+# AWS Cost Optimize
+
+This workflow analyzes Infrastructure-as-Code (IaC) files and AWS resources to generate cost optimization recommendations. It creates individual GitHub issues for each optimization opportunity plus one EPIC issue to coordinate implementation, enabling efficient tracking and execution of cost savings initiatives.
+
+## Prerequisites
+- AWS CLI configured and authenticated (`aws sts get-caller-identity` succeeds)
+- GitHub MCP server configured and authenticated
+- Target GitHub repository identified
+- AWS resources deployed (IaC files optional but helpful)
+
+## Workflow Steps
+
+### Step 1: Get AWS Cost Optimization Best Practices
+**Action**: Retrieve cost optimization best practices before analysis
+**Tools**: `fetch` to retrieve AWS documentation
+**Process**:
+1. **Load Best Practices**:
+   - Fetch `https://docs.aws.amazon.com/cost-management/latest/userguide/cost-optimization-best-practices.html`
+   - Fetch the AWS Well-Architected Cost Optimization pillar summary
+   - Use these practices to inform subsequent analysis and recommendations
+
+### Step 2: Discover AWS Infrastructure
+**Action**: Dynamically discover and analyze AWS resources and configurations
+**Tools**: AWS CLI + Local file system access
+**Process**:
+1. **Account & Region Discovery**:
+   - Execute `aws sts get-caller-identity` to confirm account
+   - Execute `aws configure get region` to determine default region
+
+2. **Resource Discovery** (per region):
+   - EC2 instances: `aws ec2 describe-instances --query 'Reservations[].Instances[].[InstanceId,InstanceType,State.Name,Tags]'`
+   - RDS instances: `aws rds describe-db-instances --query 'DBInstances[].[DBInstanceIdentifier,DBInstanceClass,Engine,MultiAZ]'`
+   - Lambda functions: `aws lambda list-functions --query 'Functions[].[FunctionName,Runtime,MemorySize,Architectures]'`
+   - ECS clusters/services: `aws ecs list-clusters` then `aws ecs describe-services`
+   - S3 buckets: `aws s3api list-buckets --query 'Buckets[].Name'`
+   - ElastiCache clusters: `aws elasticache describe-cache-clusters`
+   - NAT Gateways: `aws ec2 describe-nat-gateways`
+   - Load Balancers: `aws elbv2 describe-load-balancers`
+
+3. **IaC Detection**:
+   - Scan for IaC files: `**/*.tf`, `**/*.yaml` (CloudFormation/SAM), `**/*.json` (CloudFormation), `**/cdk.json`, `lib/**/*.ts` (CDK)
+   - Parse resource definitions to understand intended configurations
+   - Do NOT use application code files — only IaC files as the source of truth
+   - If no IaC files found: STOP and report to user
+
+### Step 3: Collect Usage Metrics & Validate Current Costs
+**Action**: Gather utilization data and verify actual resource costs
+**Tools**: AWS CLI (CloudWatch, Cost Explorer)
+**Process**:
+1. **CloudWatch Metrics** (last 7 days):
+   ```bash
+   # EC2 CPU utilization
+   aws cloudwatch get-metric-statistics \
+     --namespace AWS/EC2 --metric-name CPUUtilization \
+     --dimensions Name=InstanceId,Value=<id> \
+     --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
+     --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
+     --period 3600 --statistics Average
+
+   # Lambda duration
+   aws cloudwatch get-metric-statistics \
+     --namespace AWS/Lambda --metric-name Duration \
+     --dimensions Name=FunctionName,Value=<name> \
+     --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
+     --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
+     --period 86400 --statistics Average,Maximum
+   ```
+
+2. **AWS Cost Explorer**:
+   ```bash
+   aws ce get-cost-and-usage \
+     --time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
+     --granularity MONTHLY --metrics BlendedCost \
+     --group-by Type=DIMENSION,Key=SERVICE
+   ```
+
+3. **Calculate Baseline Metrics**: CPU/Memory averages, Lambda invocation rates, data transfer patterns, and a realistic current monthly total.
+
+### Step 4: Generate Cost Optimization Recommendations
+**Action**: Analyze resources to identify optimization opportunities
+**Process**:
+1. **Apply Optimization Patterns**:
+
+   **Compute**:
+   - EC2: Right-size based on CPU/memory (<20% average → downsize), convert On-Demand to Savings Plans, migrate to Graviton/ARM (up to 40% cheaper)
+   - Lambda: Reduce memory for idle functions, switch to `arm64` (20% cheaper)
+   - ECS/EKS: Use Fargate Spot for dev/batch workloads
+
+   **Database**:
+   - RDS: Right-size instance class, convert single-AZ for dev, use Aurora Serverless v2 for variable load
+   - DynamoDB: Switch Provisioned → On-Demand for unpredictable traffic
+   - ElastiCache: Right-size node type based on memory utilization
+
+   **Storage**:
+   - S3: Lifecycle policies (Standard → Standard-IA after 30d → Glacier after 90d), enable Intelligent-Tiering
+   - EBS: Delete unattached volumes, convert gp2 → gp3 (same performance, 20% cheaper)
+
+   **Network**:
+   - Consolidate NAT Gateways for non-production environments
+   - Use VPC endpoints for S3/DynamoDB to avoid NAT Gateway charges
+
+2. **Calculate Priority Score**:
+   ```
+   Priority Score = (Value Score × Monthly Savings) / (Risk Score × Implementation Days)
+   High: Score > 20 | Medium: Score 5-20 | Low: Score < 5
+   ```
+
+### Step 5: User Confirmation
+**Action**: Present summary and get approval before creating GitHub issues
+
+```
+🎯 AWS Cost Optimization Summary
+
+📊 Analysis Results:
+• Total Resources Analyzed: X
+• Current Monthly Cost: $X
+• Potential Monthly Savings: $Y
+• Optimization Opportunities: Z
+• High Priority Items: N
+
+🏆 Recommendations:
+1. [Resource]: [Current] → [Target] = $X/month savings - [Risk] | [Effort]
+...
+
+💡 This will create Y individual GitHub issues + 1 EPIC issue.
+
+❓ Proceed with creating GitHub issues? (y/n)
+```
+
+Wait for user confirmation before proceeding.
+
+### Step 6: Create Individual Optimization Issues
+**Action**: Create separate GitHub issues for each optimization. Label with "cost-optimization" (green) and "aws" (orange).
+
+**Title**: `[COST-OPT] [Resource Type] - [Brief Description] - $X/month savings`
+
+**Body**:
+```markdown
+## 💰 Cost Optimization: [Brief Title]
+
+**Monthly Savings**: $X | **Risk Level**: [Low/Medium/High] | **Effort**: X days
+
+### 📋 Description
+[Clear explanation of the optimization and why it's needed]
+
+### 🔧 Implementation
+
+**IaC Files Detected**: [Yes/No]
+
+```bash
+# IaC modification (preferred) or AWS CLI fallback
+```
+
+### 📊 Evidence
+- Current Configuration: [details]
+- Usage Pattern: [evidence from CloudWatch]
+- Cost Impact: $X/month → $Y/month
+
+### ✅ Validation Steps
+- [ ] Test in non-production environment
+- [ ] Verify no performance degradation via CloudWatch
+- [ ] Confirm cost reduction in AWS Cost Explorer
+
+### ⚠️ Risks & Considerations
+- [Risk and mitigation]
+
+**Priority Score**: X | **Value**: X/10 | **Risk**: X/10
+```
+
+### Step 7: Create EPIC Coordinating Issue
+**Action**: Create master tracking issue. Label with "cost-optimization" (green), "aws" (orange), "epic" (purple).
+
+**Title**: `[EPIC] AWS Cost Optimization Initiative - $X/month potential savings`
+
+**Body**: Executive summary with account/region details, Mermaid architecture diagram of current resources, prioritized checklist linking all individual issues (High → Medium → Low), progress tracking, and success criteria (>80% of estimated savings realized, no performance degradation).
+
+## Error Handling
+- **AWS Authentication Failure**: Guide through `aws configure`
+- **No Resources Found**: Create informational issue about AWS resource deployment
+- **Insufficient Permissions**: List required IAM read-only permissions
+- **GitHub Creation Failure**: Output formatted recommendations to console
+- **Cost Explorer Not Enabled**: Guide user to enable in AWS Console
+
+## Success Criteria
+- ✅ All cost estimates verified against actual configurations and AWS pricing
+- ✅ Individual GitHub issues created for each optimization
+- ✅ EPIC issue provides comprehensive coordination and tracking
+- ✅ All recommendations include specific AWS CLI or IaC commands
+- ✅ User confirmation obtained before creating issues
@@ -0,0 +1,179 @@
+---
+name: aws-resource-health-diagnose
+description: 'Analyze AWS resource health, diagnose issues from CloudWatch logs and metrics, and create a remediation plan for identified problems.'
+---
+
+# AWS Resource Health & Issue Diagnosis
+
+This workflow analyzes a specific AWS resource to assess its health status, diagnose potential issues using CloudWatch logs and metrics, and develop a comprehensive remediation plan for any problems discovered.
+
+## Prerequisites
+- AWS CLI configured and authenticated
+- Target AWS resource identified (name, type, and optionally region/account)
+- CloudWatch logging and metrics enabled on the target resource
+
+## Workflow Steps
+
+### Step 1: Get AWS Diagnostic Best Practices
+Fetch `https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/` for monitoring and troubleshooting guidance to inform the diagnostic approach.
+
+### Step 2: Resource Discovery & Identification
+Locate the target resource using the appropriate AWS CLI command for its type:
+
+```bash
+# EC2
+aws ec2 describe-instances --filters "Name=tag:Name,Values=<name>"
+# Lambda
+aws lambda get-function --function-name <name>
+# RDS
+aws rds describe-db-instances --db-instance-identifier <name>
+# ECS
+aws ecs describe-services --cluster <cluster> --services <name>
+# ALB
+aws elbv2 describe-load-balancers --names <name>
+# DynamoDB
+aws dynamodb describe-table --table-name <name>
+# SQS
+aws sqs get-queue-attributes --queue-url <url> --attribute-names All
+# API Gateway
+aws apigatewayv2 get-apis
+```
+
+If multiple matches are found, prompt the user to specify region/account.
+
+### Step 3: Health Status Assessment
+Run service-specific health checks:
+
+```bash
+# EC2
+aws ec2 describe-instance-status --instance-ids <id>
+
+# RDS
+aws rds describe-db-instances --db-instance-identifier <name> \
+  --query 'DBInstances[0].DBInstanceStatus'
+
+# Lambda - error rate over 24h
+aws cloudwatch get-metric-statistics --namespace AWS/Lambda \
+  --metric-name Errors --dimensions Name=FunctionName,Value=<name> \
+  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
+  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
+  --period 3600 --statistics Sum
+
+# ECS
+aws ecs describe-services --cluster <cluster> --services <name> \
+  --query 'services[0].[status,runningCount,desiredCount,pendingCount]'
+```
+
+Key health indicators by service type:
+- **Lambda**: Error rate, throttle rate, duration P99, concurrent executions
+- **RDS**: CPU utilization, FreeStorageSpace, DatabaseConnections, ReadLatency/WriteLatency
+- **ECS**: Running vs desired task count, task stop reason
+- **ALB**: TargetResponseTime, HTTPCode_ELB_5XX_Count, UnHealthyHostCount
+- **SQS**: ApproximateNumberOfMessagesNotVisible, ApproximateAgeOfOldestMessage
+- **DynamoDB**: ConsumedReadCapacityUnits, ThrottledRequests, SuccessfulRequestLatency
+
+### Step 4: Log & Metrics Analysis
+Find log groups and run CloudWatch Logs Insights queries:
+
+```bash
+# Find log groups
+aws logs describe-log-groups --log-group-name-prefix /aws/<service>/<name>
+
+# Start a query (last 24h errors)
+aws logs start-query \
+  --log-group-name /aws/lambda/<name> \
+  --start-time $(date -u -d '24 hours ago' +%s) \
+  --end-time $(date -u +%s) \
+  --query-string 'filter @message like /ERROR/ | stats count(*) as errorCount by bin(1h)'
+
+# Get results
+aws logs get-query-results --query-id <id>
+
+# Lambda cold starts
+aws logs start-query \
+  --log-group-name /aws/lambda/<name> \
+  --start-time $(date -u -d '24 hours ago' +%s) \
+  --end-time $(date -u +%s) \
+  --query-string 'filter @type = "REPORT" | filter @initDuration > 0 | stats count() as coldStarts by bin(1h)'
+
+# RDS Performance Insights (if enabled)
+aws pi get-resource-metrics \
+  --service-type RDS --identifier db:<identifier> \
+  --metric-queries '[{"Metric":"db.load.avg"}]' \
+  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
+  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
+  --period-in-seconds 3600
+```
+
+Identify: recurring error patterns, correlation with deployments (CloudTrail), performance trends, dependency failures.
+
+### Step 5: Issue Classification & Root Cause Analysis
+**Severity**:
+- **Critical**: Service unavailable, data loss, security incidents
+- **High**: Performance degradation, error rates >5%, intermittent failures
+- **Medium**: Warnings, suboptimal configuration, minor performance issues
+- **Low**: Informational alerts, optimization opportunities
+
+**Root Cause Categories**:
+- Configuration Issues: wrong settings, missing env vars, IAM permission denials
+- Resource Constraints: CPU/memory/disk limits, Lambda throttling, RDS connection exhaustion
+- Network Issues: security group rules, VPC routing, DNS, NACLs
+- Application Issues: code bugs, memory leaks, unhandled exceptions, slow queries
+- Dependency Issues: downstream timeouts, SQS/SNS failures, external API limits
+- Security Issues: KMS key issues, certificate expiration
+
+### Step 6: Generate Remediation Plan
+
+**Immediate Actions** (Critical):
+```bash
+# Lambda throttling — increase reserved concurrency
+aws lambda put-reserved-concurrency \
+  --function-name <name> --reserved-concurrent-executions 100
+
+# RDS connection exhaustion — reboot to reset connections
+aws rds reboot-db-instance --db-instance-identifier <name>
+```
+
+**Short-term Fixes** (High/Medium): Configuration adjustments, right-sizing, CloudWatch alarm improvements, IAM corrections.
+
+**Long-term Improvements**: Architectural changes for resilience, preventive monitoring, enable AWS Health Dashboard notifications via EventBridge.
+
+### Step 7: Report & User Confirmation
+
+Present findings:
+```
+🏥 AWS Resource Health Assessment
+
+📊 Resource Overview:
+• Resource: [Name] ([Type])
+• Status: [Healthy/Warning/Critical]
+• Region: [Region] | Account: [Account ID]
+
+🚨 Issues Identified:
+• Critical: X | High: Y | Medium: Z | Low: N
+
+🔍 Top Issues:
+1. [Issue]: [Description] — Impact: [High/Medium/Low]
+2. [Issue]: [Description] — Impact: [High/Medium/Low]
+
+🛠️ Remediation: X immediate, Y short-term, Z long-term actions
+
+❓ Proceed with detailed remediation plan? (y/n)
+```
+
+Then generate a full markdown report covering: health metrics, issues with root cause analysis, phased remediation steps with AWS CLI commands, CloudWatch alarm recommendations, and validation checklist.
+
+## Error Handling
+- **Resource Not Found**: Ask user to clarify name/region
+- **Authentication Issues**: Guide through `aws configure`
+- **Insufficient Permissions**: List required IAM actions (`logs:*`, `cloudwatch:*`, `pi:*`)
+- **No Logs Available**: Suggest enabling CloudWatch logging for the resource type
+- **Query Timeouts**: Use shorter time windows
+
+## Success Criteria
+- ✅ Resource health accurately assessed across all key metrics
+- ✅ All significant issues identified and classified by severity
+- ✅ Root cause analysis completed for major problems
+- ✅ Actionable remediation plan with AWS CLI commands
+- ✅ CloudWatch monitoring recommendations included
+- ✅ Implementation steps include validation and rollback procedures
@@ -0,0 +1,631 @@
+---
+name: aws-resource-query
+description: 'Query AWS resources using natural language. Covers EC2, S3, RDS, Lambda, ECS, EKS, Secrets Manager, IAM, VPC, networking, messaging, and more. Strictly read-only — no writes, deletes, or mutations.'
+---
+
+# AWS Resource Query
+
+Answer natural language questions about AWS resources by translating intent into read-only AWS CLI commands. This skill **never** runs commands that create, modify, or delete resources.
+
+## Safety Contract
+
+**STRICTLY READ-ONLY.** This skill exclusively uses:
+- `aws <service> describe-*`
+- `aws <service> list-*`
+- `aws <service> get-*`
+- `aws sts get-caller-identity`
+- `aws configure get`
+- `aws resourcegroupstaggingapi get-resources`
+- `aws ce get-*`
+- `aws support describe-*`
+
+**NEVER** run any of the following, regardless of what the user asks:
+`create-*`, `run-*`, `start-*`, `stop-*`, `reboot-*`, `delete-*`, `terminate-*`, `put-*`, `update-*`, `modify-*`, `attach-*`, `detach-*`, `send-*`, `publish-*`, `invoke-*`, `execute-*`
+
+If the user's query implies a write action, respond:
+> "This skill is read-only. I can show you the current state of [resource], but I cannot [create/modify/delete] it. Would you like to see what currently exists?"
+
+## Workflow
+
+### Step 1: Parse Intent
+Identify: target service(s), scope (all / filtered / specific), detail level, and region.
+
+### Step 2: Confirm Account & Region
+```bash
+aws sts get-caller-identity --query '{Account:Account,UserId:UserId}'
+aws configure get region
+```
+Append `--region <region>` to all commands when the user specifies one.
+
+### Step 3: Execute & Format
+Run the matched read-only command(s) below and format results as a readable table. For large result sets show a count first and offer to filter further.
+
+---
+
+## Intent → Command Mapping
+
+### COMPUTE
+
+#### EC2 Instances
+```bash
+# "list EC2 instances" / "show my VMs" / "what instances are running"
+aws ec2 describe-instances \
+  --query 'Reservations[].Instances[].[InstanceId,InstanceType,State.Name,Tags[?Key==`Name`].Value|[0],PrivateIpAddress,PublicIpAddress]' \
+  --output table
+
+# "running instances only"
+aws ec2 describe-instances --filters Name=instance-state-name,Values=running \
+  --query 'Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0],PrivateIpAddress]' \
+  --output table
+
+# "stopped instances"
+aws ec2 describe-instances --filters Name=instance-state-name,Values=stopped \
+  --query 'Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0]]' \
+  --output table
+
+# "instance types in use"
+aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceType' --output text | sort | uniq -c | sort -rn
+
+# "auto scaling groups" / "ASGs"
+aws autoscaling describe-auto-scaling-groups \
+  --query 'AutoScalingGroups[].[AutoScalingGroupName,MinSize,MaxSize,DesiredCapacity]' --output table
+
+# "elastic IPs" / "EIPs"
+aws ec2 describe-addresses \
+  --query 'Addresses[].[PublicIp,InstanceId,AllocationId,AssociationId]' --output table
+
+# "key pairs"
+aws ec2 describe-key-pairs \
+  --query 'KeyPairs[].[KeyName,CreateTime]' --output table
+
+# "AMIs I own"
+aws ec2 describe-images --owners self \
+  --query 'Images[].[ImageId,Name,CreationDate,State]' --output table
+
+# "spot instances"
+aws ec2 describe-spot-instance-requests \
+  --query 'SpotInstanceRequests[].[SpotInstanceRequestId,State,InstanceId,LaunchSpecification.InstanceType]' --output table
+```
+
+#### Lambda Functions
+```bash
+# "list Lambda functions" / "show serverless functions"
+aws lambda list-functions \
+  --query 'Functions[].[FunctionName,Runtime,MemorySize,Timeout,LastModified]' --output table
+
+# "Lambda function details for <name>"
+aws lambda get-function-configuration --function-name <name>
+
+# "Lambda event source mappings" / "Lambda triggers"
+aws lambda list-event-source-mappings \
+  --query 'EventSourceMappings[].[FunctionArn,EventSourceArn,State,BatchSize]' --output table
+
+# "Lambda layers"
+aws lambda list-layers \
+  --query 'Layers[].[LayerName,LatestMatchingVersion.LayerVersionArn]' --output table
+
+# "Lambda concurrency for <name>"
+aws lambda get-function-concurrency --function-name <name>
+```
+
+#### ECS
+```bash
+# "ECS clusters"
+aws ecs list-clusters --query 'clusterArns' --output table
+
+# "ECS cluster details"
+aws ecs describe-clusters \
+  --clusters $(aws ecs list-clusters --query 'clusterArns[]' --output text) \
+  --query 'clusters[].[clusterName,status,runningTasksCount,activeServicesCount]' --output table
+
+# "ECS services in <cluster>"
+aws ecs describe-services --cluster <cluster> \
+  --services $(aws ecs list-services --cluster <cluster> --query 'serviceArns[]' --output text) \
+  --query 'services[].[serviceName,status,runningCount,desiredCount]' --output table
+
+# "ECS task definitions"
+aws ecs list-task-definitions --query 'taskDefinitionArns' --output table
+```
+
+#### EKS
+```bash
+# "EKS clusters" / "Kubernetes clusters"
+aws eks list-clusters --query 'clusters' --output table
+
+# "EKS cluster details for <name>"
+aws eks describe-cluster --name <name> \
+  --query 'cluster.[name,status,version,endpoint]'
+
+# "EKS node groups for <cluster>"
+aws eks list-nodegroups --cluster-name <name> --query 'nodegroups' --output table
+
+# "EKS add-ons for <cluster>"
+aws eks list-addons --cluster-name <name> --query 'addons' --output table
+```
+
+#### Other Compute
+```bash
+# "Beanstalk environments"
+aws elasticbeanstalk describe-environments \
+  --query 'Environments[].[EnvironmentName,ApplicationName,Status,Health]' --output table
+
+# "Batch job queues"
+aws batch describe-job-queues \
+  --query 'jobQueues[].[jobQueueName,state,status,priority]' --output table
+
+# "Batch compute environments"
+aws batch describe-compute-environments \
+  --query 'computeEnvironments[].[computeEnvironmentName,type,state,status]' --output table
+```
+
+---
+
+### STORAGE
+
+#### S3
+```bash
+# "list S3 buckets" / "show my buckets"
+aws s3api list-buckets --query 'Buckets[].[Name,CreationDate]' --output table
+
+# "S3 bucket encryption for <name>"
+aws s3api get-bucket-encryption --bucket <name>
+
+# "S3 bucket versioning for <name>"
+aws s3api get-bucket-versioning --bucket <name>
+
+# "S3 public access settings for <name>"
+aws s3api get-public-access-block --bucket <name>
+
+# "S3 lifecycle rules for <name>"
+aws s3api get-bucket-lifecycle-configuration --bucket <name>
+
+# "S3 bucket policy for <name>"
+aws s3api get-bucket-policy --bucket <name>
+
+# "list objects in s3://<bucket>/<prefix>"
+aws s3api list-objects-v2 --bucket <bucket> --prefix <prefix> \
+  --query 'Contents[].[Key,Size,LastModified,StorageClass]' --output table
+```
+
+#### EBS & EFS
+```bash
+# "EBS volumes" / "list volumes"
+aws ec2 describe-volumes \
+  --query 'Volumes[].[VolumeId,Size,VolumeType,State,AvailabilityZone,Attachments[0].InstanceId]' --output table
+
+# "unattached EBS volumes" / "unused volumes"
+aws ec2 describe-volumes --filters Name=status,Values=available \
+  --query 'Volumes[].[VolumeId,Size,VolumeType,CreateTime]' --output table
+
+# "EBS snapshots I own"
+aws ec2 describe-snapshots --owner-ids self \
+  --query 'Snapshots[].[SnapshotId,VolumeId,State,StartTime]' --output table
+
+# "EFS file systems"
+aws efs describe-file-systems \
+  --query 'FileSystems[].[FileSystemId,Name,LifeCycleState,SizeInBytes.Value,ThroughputMode]' --output table
+```
+
+---
+
+### DATABASES
+
+#### RDS
+```bash
+# "list RDS instances" / "show databases" / "what databases do I have"
+aws rds describe-db-instances \
+  --query 'DBInstances[].[DBInstanceIdentifier,DBInstanceClass,Engine,EngineVersion,DBInstanceStatus,MultiAZ,Endpoint.Address]' \
+  --output table
+
+# "Aurora clusters" / "RDS clusters"
+aws rds describe-db-clusters \
+  --query 'DBClusters[].[DBClusterIdentifier,Engine,EngineVersion,Status,MultiAZ,Endpoint]' --output table
+
+# "RDS snapshots"
+aws rds describe-db-snapshots \
+  --query 'DBSnapshots[].[DBSnapshotIdentifier,DBInstanceIdentifier,Engine,Status,SnapshotCreateTime]' --output table
+
+# "RDS parameter groups"
+aws rds describe-db-parameter-groups \
+  --query 'DBParameterGroups[].[DBParameterGroupName,DBParameterGroupFamily]' --output table
+
+# "RDS subnet groups"
+aws rds describe-db-subnet-groups \
+  --query 'DBSubnetGroups[].[DBSubnetGroupName,VpcId]' --output table
+```
+
+#### DynamoDB
+```bash
+# "DynamoDB tables" / "list NoSQL tables"
+aws dynamodb list-tables --query 'TableNames' --output table
+
+# "DynamoDB table details for <name>"
+aws dynamodb describe-table --table-name <name> \
+  --query 'Table.[TableName,TableStatus,ItemCount,BillingModeSummary.BillingMode]'
+
+# "DynamoDB backups"
+aws dynamodb list-backups \
+  --query 'BackupSummaries[].[TableName,BackupName,BackupStatus,BackupCreationDateTime]' --output table
+
+# "DynamoDB global tables"
+aws dynamodb list-global-tables \
+  --query 'GlobalTables[].[GlobalTableName,ReplicationGroup[].RegionName]' --output table
+```
+
+#### ElastiCache & Redshift
+```bash
+# "ElastiCache clusters" / "Redis clusters"
+aws elasticache describe-cache-clusters \
+  --query 'CacheClusters[].[CacheClusterId,Engine,EngineVersion,CacheNodeType,CacheClusterStatus]' --output table
+
+# "ElastiCache replication groups"
+aws elasticache describe-replication-groups \
+  --query 'ReplicationGroups[].[ReplicationGroupId,Status,AutomaticFailover]' --output table
+
+# "Redshift clusters" / "data warehouse"
+aws redshift describe-clusters \
+  --query 'Clusters[].[ClusterIdentifier,ClusterStatus,NodeType,NumberOfNodes,Endpoint.Address]' --output table
+
+# "DocumentDB clusters"
+aws docdb describe-db-clusters \
+  --query 'DBClusters[].[DBClusterIdentifier,Status,Engine,Endpoint]' --output table
+
+# "Neptune clusters" / "graph databases"
+aws neptune describe-db-clusters \
+  --query 'DBClusters[].[DBClusterIdentifier,Status,Engine,Endpoint]' --output table
+```
+
+---
+
+### NETWORKING
+
+#### VPC & Subnets
+```bash
+# "list VPCs" / "show my VPCs"
+aws ec2 describe-vpcs \
+  --query 'Vpcs[].[VpcId,CidrBlock,IsDefault,Tags[?Key==`Name`].Value|[0],State]' --output table
+
+# "subnets" / "list subnets"
+aws ec2 describe-subnets \
+  --query 'Subnets[].[SubnetId,VpcId,CidrBlock,AvailabilityZone,MapPublicIpOnLaunch,Tags[?Key==`Name`].Value|[0]]' --output table
+
+# "public subnets"
+aws ec2 describe-subnets --filters "Name=mapPublicIpOnLaunch,Values=true" \
+  --query 'Subnets[].[SubnetId,VpcId,CidrBlock,AvailabilityZone]' --output table
+
+# "security groups"
+aws ec2 describe-security-groups \
+  --query 'SecurityGroups[].[GroupId,GroupName,VpcId,Description]' --output table
+
+# "security group rules for <group-id>"
+aws ec2 describe-security-group-rules --filters "Name=group-id,Values=<id>" \
+  --query 'SecurityGroupRules[].[IsEgress,IpProtocol,FromPort,ToPort,CidrIpv4,Description]' --output table
+
+# "route tables"
+aws ec2 describe-route-tables \
+  --query 'RouteTables[].[RouteTableId,VpcId,Associations[0].SubnetId,Tags[?Key==`Name`].Value|[0]]' --output table
+
+# "internet gateways" / "IGWs"
+aws ec2 describe-internet-gateways \
+  --query 'InternetGateways[].[InternetGatewayId,Attachments[0].VpcId,Tags[?Key==`Name`].Value|[0]]' --output table
+
+# "NAT gateways"
+aws ec2 describe-nat-gateways \
+  --query 'NatGateways[].[NatGatewayId,VpcId,SubnetId,State,NatGatewayAddresses[0].PublicIp]' --output table
+
+# "VPC endpoints"
+aws ec2 describe-vpc-endpoints \
+  --query 'VpcEndpoints[].[VpcEndpointId,VpcId,ServiceName,State,VpcEndpointType]' --output table
+
+# "VPC peering connections"
+aws ec2 describe-vpc-peering-connections \
+  --query 'VpcPeeringConnections[].[VpcPeeringConnectionId,Status.Code,RequesterVpcInfo.VpcId,AccepterVpcInfo.VpcId]' --output table
+
+# "NACLs" / "network ACLs"
+aws ec2 describe-network-acls \
+  --query 'NetworkAcls[].[NetworkAclId,VpcId,IsDefault]' --output table
+
+# "Transit Gateways"
+aws ec2 describe-transit-gateways \
+  --query 'TransitGateways[].[TransitGatewayId,State,Description]' --output table
+```
+
+#### Load Balancers & DNS
+```bash
+# "load balancers" / "ALBs" / "NLBs"
+aws elbv2 describe-load-balancers \
+  --query 'LoadBalancers[].[LoadBalancerName,Type,Scheme,State.Code,DNSName]' --output table
+
+# "target groups"
+aws elbv2 describe-target-groups \
+  --query 'TargetGroups[].[TargetGroupName,Protocol,Port,TargetType,VpcId]' --output table
+
+# "target health for <target-group-arn>"
+aws elbv2 describe-target-health --target-group-arn <arn> \
+  --query 'TargetHealthDescriptions[].[Target.Id,TargetHealth.State,TargetHealth.Description]' --output table
+
+# "Route 53 hosted zones" / "DNS zones"
+aws route53 list-hosted-zones \
+  --query 'HostedZones[].[Id,Name,Config.PrivateZone,ResourceRecordSetCount]' --output table
+
+# "DNS records in zone <id>"
+aws route53 list-resource-record-sets --hosted-zone-id <id> \
+  --query 'ResourceRecordSets[].[Name,Type,TTL]' --output table
+
+# "CloudFront distributions"
+aws cloudfront list-distributions \
+  --query 'DistributionList.Items[].[Id,DomainName,Status,Origins.Items[0].DomainName]' --output table
+
+# "VPN connections"
+aws ec2 describe-vpn-connections \
+  --query 'VpnConnections[].[VpnConnectionId,State,Type,CustomerGatewayId]' --output table
+
+# "Direct Connect connections"
+aws directconnect describe-connections \
+  --query 'connections[].[connectionId,connectionName,connectionState,bandwidth]' --output table
+```
+
+---
+
+### SECURITY & IDENTITY
+
+#### IAM
+```bash
+# "IAM users" / "list users"
+aws iam list-users \
+  --query 'Users[].[UserName,UserId,CreateDate,PasswordLastUsed]' --output table
+
+# "IAM roles" / "list roles"
+aws iam list-roles \
+  --query 'Roles[].[RoleName,RoleId,CreateDate]' --output table
+
+# "IAM policies attached to role <name>"
+aws iam list-attached-role-policies --role-name <name> \
+  --query 'AttachedPolicies[].[PolicyName,PolicyArn]' --output table
+
+# "IAM groups"
+aws iam list-groups \
+  --query 'Groups[].[GroupName,GroupId,CreateDate]' --output table
+
+# "IAM policies (customer managed)"
+aws iam list-policies --scope Local \
+  --query 'Policies[].[PolicyName,AttachmentCount,CreateDate]' --output table
+
+# "who has MFA enabled" / "MFA devices"
+aws iam list-virtual-mfa-devices \
+  --query 'VirtualMFADevices[].[SerialNumber,User.UserName,EnableDate]' --output table
+
+# "IAM account password policy"
+aws iam get-account-password-policy
+
+# "IAM account summary"
+aws iam get-account-summary
+```
+
+#### Secrets Manager
+```bash
+# "list secrets" / "Secrets Manager secrets" / "show secrets"
+aws secretsmanager list-secrets \
+  --query 'SecretList[].[Name,ARN,LastChangedDate,LastAccessedDate,Description]' --output table
+
+# "secret metadata for <name>"
+aws secretsmanager describe-secret --secret-id <name> \
+  --query '{Name:Name,ARN:ARN,RotationEnabled:RotationEnabled,LastRotatedDate:LastRotatedDate,Tags:Tags}'
+
+# "secrets with rotation enabled"
+aws secretsmanager list-secrets \
+  --query 'SecretList[?RotationEnabled==`true`].[Name,LastRotatedDate]' --output table
+```
+
+> ⚠️ **Note**: Secret **values** are never retrieved (`get-secret-value` is excluded). Only metadata is shown.
+
+#### SSM Parameter Store
+```bash
+# "SSM parameters" / "Parameter Store"
+aws ssm describe-parameters \
+  --query 'Parameters[].[Name,Type,LastModifiedDate,Description]' --output table
+
+# "SSM parameters by path <path>"
+aws ssm describe-parameters \
+  --parameter-filters "Key=Path,Values=<path>" \
+  --query 'Parameters[].[Name,Type,LastModifiedDate]' --output table
+```
+
+> ⚠️ **Note**: Parameter **values** are never retrieved (`get-parameter` is excluded). Only metadata is shown.
+
+#### KMS & Certificates
+```bash
+# "KMS keys" / "encryption keys"
+aws kms list-keys --query 'Keys[].[KeyId,KeyArn]' --output table
+
+# "KMS key details for <id>"
+aws kms describe-key --key-id <id> \
+  --query 'KeyMetadata.[KeyId,Description,KeyState,KeyUsage,CreationDate,Enabled]'
+
+# "KMS aliases"
+aws kms list-aliases \
+  --query 'Aliases[].[AliasName,AliasArn,TargetKeyId]' --output table
+
+# "SSL certificates" / "ACM certificates"
+aws acm list-certificates \
+  --query 'CertificateSummaryList[].[CertificateArn,DomainName,Status,RenewalEligibility]' --output table
+
+# "certificate details for <arn>"
+aws acm describe-certificate --certificate-arn <arn> \
+  --query 'Certificate.[DomainName,Status,NotAfter,NotBefore,InUseBy]'
+```
+
+#### GuardDuty, Security Hub & Config
+```bash
+# "GuardDuty detectors"
+aws guardduty list-detectors --query 'DetectorIds' --output table
+
+# "GuardDuty findings"
+aws guardduty list-findings --detector-id <id> --query 'FindingIds' --output table
+
+# "Security Hub findings"
+aws securityhub get-findings \
+  --query 'Findings[].[Title,Severity.Label,WorkflowState,UpdatedAt]' --output table
+
+# "AWS Config rules"
+aws configservice describe-config-rules \
+  --query 'ConfigRules[].[ConfigRuleName,ConfigRuleState,Source.SourceIdentifier]' --output table
+
+# "non-compliant resources"
+aws configservice get-compliance-summary-by-config-rule \
+  --query 'ComplianceSummariesByConfigRule[].[ConfigRuleName,Compliance.ComplianceType]' --output table
+```
+
+---
+
+### MESSAGING & EVENTS
+
+```bash
+# "SQS queues" / "list queues"
+aws sqs list-queues --query 'QueueUrls' --output table
+
+# "SQS queue details / message count for <url>"
+aws sqs get-queue-attributes --queue-url <url> \
+  --attribute-names ApproximateNumberOfMessages,ApproximateNumberOfMessagesNotVisible,ApproximateAgeOfOldestMessage
+
+# "SNS topics"
+aws sns list-topics --query 'Topics[].TopicArn' --output table
+
+# "SNS subscriptions"
+aws sns list-subscriptions \
+  --query 'Subscriptions[].[SubscriptionArn,Protocol,Endpoint,TopicArn]' --output table
+
+# "EventBridge rules"
+aws events list-rules \
+  --query 'Rules[].[Name,State,ScheduleExpression,EventPattern]' --output table
+
+# "EventBridge event buses"
+aws events list-event-buses \
+  --query 'EventBuses[].[Name,Arn]' --output table
+
+# "Kinesis streams"
+aws kinesis list-streams --query 'StreamNames' --output table
+
+# "Kinesis Firehose delivery streams"
+aws firehose list-delivery-streams --query 'DeliveryStreamNames' --output table
+```
+
+---
+
+### API GATEWAY & SERVERLESS
+
+```bash
+# "API Gateway APIs" / "REST APIs"
+aws apigateway get-rest-apis \
+  --query 'items[].[id,name,description,createdDate]' --output table
+
+# "HTTP APIs" / "API Gateway v2"
+aws apigatewayv2 get-apis \
+  --query 'Items[].[ApiId,Name,ProtocolType,ApiEndpoint,CreatedDate]' --output table
+
+# "Step Functions state machines" / "workflows"
+aws stepfunctions list-state-machines \
+  --query 'stateMachines[].[name,stateMachineArn,type,creationDate]' --output table
+
+# "Step Functions executions for <arn>"
+aws stepfunctions list-executions --state-machine-arn <arn> \
+  --query 'executions[].[name,status,startDate,stopDate]' --output table
+```
+
+---
+
+### MONITORING & OBSERVABILITY
+
+```bash
+# "CloudWatch alarms" / "list alarms"
+aws cloudwatch describe-alarms \
+  --query 'MetricAlarms[].[AlarmName,StateValue,MetricName,Namespace,Threshold]' --output table
+
+# "alarms in ALARM state" / "triggered alarms"
+aws cloudwatch describe-alarms --state-value ALARM \
+  --query 'MetricAlarms[].[AlarmName,MetricName,StateReason]' --output table
+
+# "CloudWatch dashboards"
+aws cloudwatch list-dashboards \
+  --query 'DashboardEntries[].[DashboardName,LastModified,Size]' --output table
+
+# "CloudWatch log groups"
+aws logs describe-log-groups \
+  --query 'logGroups[].[logGroupName,retentionInDays,storedBytes]' --output table
+
+# "CloudTrail trails"
+aws cloudtrail describe-trails \
+  --query 'trailList[].[Name,S3BucketName,IsMultiRegionTrail,LogFileValidationEnabled]' --output table
+
+# "ECR repositories" / "container registries"
+aws ecr describe-repositories \
+  --query 'repositories[].[repositoryName,repositoryUri,createdAt]' --output table
+```
+
+---
+
+### COST & BILLING
+
+```bash
+# "current month cost" / "how much am I spending"
+aws ce get-cost-and-usage \
+  --time-period Start=$(date -u +%Y-%m-01),End=$(date -u +%Y-%m-%d) \
+  --granularity MONTHLY --metrics BlendedCost \
+  --query 'ResultsByTime[].[TimePeriod.Start,Total.BlendedCost.Amount,Total.BlendedCost.Unit]' \
+  --output table
+
+# "cost by service" / "spending breakdown"
+aws ce get-cost-and-usage \
+  --time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
+  --granularity MONTHLY --metrics BlendedCost \
+  --group-by Type=DIMENSION,Key=SERVICE --output table
+
+# "AWS Budgets"
+aws budgets describe-budgets \
+  --account-id $(aws sts get-caller-identity --query Account --output text) \
+  --query 'Budgets[].[BudgetName,BudgetType,BudgetLimit.Amount,CalculatedSpend.ActualSpend.Amount]' \
+  --output table
+
+# "Trusted Advisor recommendations"
+aws support describe-trusted-advisor-checks --language en \
+  --query 'checks[].[id,name,category]' --output table
+```
+
+---
+
+### CROSS-SERVICE QUERIES
+
+```bash
+# "resources tagged Environment=production" / "all production resources"
+aws resourcegroupstaggingapi get-resources \
+  --tag-filters Key=Environment,Values=production \
+  --query 'ResourceTagMappingList[].[ResourceARN]' --output table
+
+# "all resources tagged <key>=<value>"
+aws resourcegroupstaggingapi get-resources \
+  --tag-filters Key=<key>,Values=<value> \
+  --query 'ResourceTagMappingList[].[ResourceARN,Tags]' --output table
+
+# "inventory of all resources" (AWS Config)
+aws configservice list-discovered-resources --resource-type <type> \
+  --query 'resourceIdentifiers[].[resourceType,resourceId,resourceName]' --output table
+```
+
+---
+
+## Output Formatting Rules
+
+1. Always use `--output table` for list results; use `--output json` only when deep detail is explicitly requested
+2. Always use `--query` to extract only relevant fields — never dump raw JSON
+3. For large result sets (>20 items), show a count first, then offer to filter
+4. When a command returns nothing, explain why (wrong region, no resources, insufficient permissions)
+5. Offer to drill into a specific resource: "Found 47 EC2 instances. Filter by state, type, or tag?"
+
+## Error Handling
+
+| Error | Response |
+|---|---|
+| `AccessDenied` | "You don't have permission to list [resource]. Required: `<service>:<Action>`." |
+| `NoCredentialProviders` | "Run `aws configure` or set `AWS_PROFILE`." |
+| Empty result | "No [resources] found in [region]. Check another region?" |
+| Invalid identifier | "Could not find '[name]'. Check the name or provide the resource ID." |
@@ -0,0 +1,184 @@
+---
+name: aws-well-architected-review
+description: 'Perform an AWS Well-Architected Framework review of the current workload IaC and architecture, generating findings and GitHub issues for improvements.'
+---
+
+# AWS Well-Architected Review
+
+This workflow performs a structured AWS Well-Architected Framework (WAF) review against your workload's IaC files and deployed infrastructure. It identifies risks across all 6 WAF pillars and creates GitHub issues to track remediation.
+
+## Prerequisites
+- AWS CLI configured and authenticated
+- IaC files present in the repository (Terraform, CloudFormation, CDK, or SAM)
+- GitHub MCP server configured and authenticated
+
+## Workflow Steps
+
+### Step 1: Load Well-Architected Framework Reference
+Fetch current AWS WAF best practices:
+- `https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html`
+- Pillar-specific lenses relevant to the workload type (Serverless, SaaS, etc.)
+
+### Step 2: Discover IaC & Architecture
+Scan the repository for IaC files:
+- Terraform: `**/*.tf`
+- CloudFormation/SAM: `**/*.yaml`, `**/*.json` (CFn templates)
+- CDK: `lib/**/*.ts`, `bin/**/*.ts`, `cdk.json`
+
+Identify key AWS services in use (compute, data, networking, security, observability) and generate a Mermaid architecture diagram.
+
+### Step 3: Pillar-by-Pillar Review
+
+#### Pillar 1: Operational Excellence
+- [ ] All infrastructure defined as IaC (no manual console changes)
+- [ ] Consistent tagging strategy applied across all resources
+- [ ] CloudWatch alarms defined for key metrics
+- [ ] Automated deployment pipeline present (no manual deployments)
+- [ ] CloudTrail enabled for audit logging
+- [ ] Runbooks or operational documentation present
+
+#### Pillar 2: Security
+- [ ] IAM roles use least-privilege policies (no `*` actions without justification)
+- [ ] No hardcoded credentials in IaC or code
+- [ ] Secrets managed via Secrets Manager or SSM Parameter Store
+- [ ] S3 buckets have public access blocked and server-side encryption enabled
+- [ ] Sensitive resources placed in private subnets
+- [ ] Security groups restrict inbound to minimum required ports/CIDRs
+- [ ] KMS encryption enabled for sensitive data stores (RDS, EBS, S3, SQS, DynamoDB)
+- [ ] SSL/TLS enforced on all endpoints (`enforceSSL: true`)
+- [ ] GuardDuty enabled (`aws guardduty list-detectors`)
+- [ ] AWS WAF configured on public-facing APIs and CloudFront distributions
+- [ ] MFA delete enabled on critical S3 buckets
+
+#### Pillar 3: Reliability
+- [ ] Multi-AZ deployments for production databases (RDS Multi-AZ, DynamoDB Global Tables)
+- [ ] Auto Scaling configured with appropriate policies for EC2/ECS
+- [ ] S3 versioning and lifecycle policies configured
+- [ ] RDS automated backups enabled with appropriate retention period
+- [ ] DynamoDB Point-in-Time Recovery (PITR) enabled
+- [ ] Dead Letter Queues (DLQ) configured for Lambda, SQS, SNS
+- [ ] Route 53 health checks configured for DNS failover
+- [ ] Lambda reserved concurrency set to prevent noisy-neighbor throttling
+
+#### Pillar 4: Performance Efficiency
+- [ ] Right-sized instance types (Lambda memory, EC2 type, RDS class)
+- [ ] Graviton/ARM instances used where available (Lambda `arm64`, EC2 Graviton)
+- [ ] Caching implemented (ElastiCache, DAX, CloudFront, API Gateway caching)
+- [ ] CloudFront used for global static content delivery
+- [ ] Aurora Serverless or DynamoDB On-Demand for variable load patterns
+- [ ] Lambda Provisioned Concurrency for latency-critical synchronous paths
+
+#### Pillar 5: Cost Optimization
+- [ ] EC2 Reserved Instances or Savings Plans for steady-state workloads
+- [ ] S3 lifecycle policies moving data to cheaper storage tiers
+- [ ] Lambda `arm64` architecture adopted (20% cost reduction)
+- [ ] VPC Endpoints for S3/DynamoDB to avoid NAT Gateway charges
+- [ ] gp2 EBS volumes migrated to gp3 (same performance, 20% cheaper)
+- [ ] Development/test environments have auto-shutdown schedules
+- [ ] AWS Budgets and Cost Anomaly Detection configured
+- [ ] Unattached EBS volumes and idle EC2 instances identified
+
+#### Pillar 6: Sustainability
+- [ ] Graviton/ARM instances selected where available
+- [ ] Serverless/managed services preferred over always-on EC2
+- [ ] S3 lifecycle policies reduce unnecessary long-term data storage
+- [ ] Auto Scaling configured to avoid over-provisioning
+- [ ] Region selection considers AWS renewable energy commitments
+
+### Step 4: Risk Classification
+For each finding, classify:
+- **High Risk**: Security vulnerability, single point of failure, no backup/recovery
+- **Medium Risk**: Suboptimal reliability, cost inefficiency, performance concern
+- **Low Risk**: Best practice deviation, minor optimization opportunity
+
+### Step 5: User Confirmation
+
+```
+🏗️ AWS Well-Architected Review Summary
+
+📊 Review Results:
+• IaC Files Analyzed: X
+• AWS Services Identified: Y
+• Total Findings: Z
+  • High Risk: A (immediate action required)
+  • Medium Risk: B (should address soon)
+  • Low Risk: C (nice to have)
+
+🔴 Top High Risk Findings:
+1. [Pillar]: [Finding] — [Why it matters]
+2. [Pillar]: [Finding] — [Why it matters]
+
+💡 This will create Z individual GitHub issues + 1 EPIC issue.
+
+❓ Proceed with creating GitHub issues? (y/n)
+```
+
+### Step 6: Create Individual Finding Issues
+Label with "well-architected" and the pillar name (e.g., "security", "reliability").
+
+**Title**: `[WAF-<PILLAR>] [Brief Finding] — [Risk Level]`
+
+**Body**:
+```markdown
+## 🏗️ Well-Architected Finding: [Brief Title]
+
+**Pillar**: [Name] | **Risk Level**: [High/Medium/Low] | **Effort**: [Low/Medium/High]
+
+### 📋 Description
+[Clear explanation of the finding and why it matters]
+
+### 🔧 Remediation
+
+**IaC Fix** (preferred):
+```hcl
+# Terraform example
+resource "aws_s3_bucket_server_side_encryption_configuration" "example" {
+  bucket = aws_s3_bucket.example.id
+  rule {
+    apply_server_side_encryption_by_default {
+      sse_algorithm = "aws:kms"
+    }
+  }
+}
+```
+
+**AWS CLI fallback**:
+```bash
+aws s3api put-bucket-encryption --bucket <name> \
+  --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}'
+```
+
+### 📚 AWS Reference
+- [WAF Best Practice Link]
+- [AWS Documentation Link]
+
+### ✅ Validation
+- [ ] Change implemented in IaC and deployed
+- [ ] AWS Config rule passes (if applicable)
+- [ ] Security Hub finding resolved (if applicable)
+
+**Well-Architected Question**: [WAF question this maps to]
+```
+
+### Step 7: Create EPIC Tracking Issue
+Label with "well-architected" and "epic".
+
+**Title**: `[EPIC] AWS Well-Architected Review — X findings across 6 pillars`
+
+**Body**: Executive summary with pillar breakdown table (finding counts by pillar and risk level), Mermaid architecture diagram, prioritized checklist linking all individual issues (High → Medium → Low), and success criteria:
+- All High-risk findings resolved
+- Medium findings have accepted mitigation plans
+- No regression in existing CloudWatch alarms or Config rules
+
+## Error Handling
+- **No IaC Files Found**: Limit review to live resource discovery via AWS CLI and note the gap
+- **Insufficient AWS Permissions**: List required read-only permissions for the review
+- **GitHub Creation Failure**: Output all findings as formatted markdown to console
+
+## Success Criteria
+- ✅ All 6 WAF pillars reviewed against IaC and live infrastructure
+- ✅ All findings classified by risk level and pillar
+- ✅ Actionable remediation steps with IaC examples for each finding
+- ✅ GitHub issues created for team tracking
+- ✅ Architecture diagram generated for EPIC context
+- ✅ AWS documentation references included