chore: publish from staged

This commit is contained in:
github-actions[bot]
2026-06-10 04:43:52 +00:00
parent 1c4c1f5d98
commit 95d2e00839
22 changed files with 3005 additions and 0 deletions
@@ -0,0 +1,30 @@
{
"name": "aws-cloud-development",
"description": "Comprehensive AWS cloud development tools including Infrastructure as Code, serverless functions, architecture patterns, and cost optimization for building scalable cloud applications.",
"version": "1.0.0",
"author": {
"name": "Awesome Copilot Community"
},
"repository": "https://github.com/github/awesome-copilot",
"license": "MIT",
"keywords": [
"aws",
"cloud",
"infrastructure",
"cloudformation",
"terraform",
"serverless",
"architecture",
"devops",
"cdk"
],
"agents": [
"./agents"
],
"skills": [
"./skills/aws-cost-optimize",
"./skills/aws-resource-health-diagnose",
"./skills/aws-resource-query",
"./skills/aws-well-architected-review"
]
}
+38
View File
@@ -0,0 +1,38 @@
# AWS Cloud Development Plugin
Comprehensive AWS cloud development tools including Infrastructure as Code, serverless functions, architecture patterns, and cost optimization for building scalable cloud applications.
## Installation
```bash
# Using Copilot CLI
copilot plugin install aws-cloud-development@awesome-copilot
```
## What's Included
### Commands (Slash Commands)
| Command | Description |
|---------|-------------|
| `/aws-cloud-development:aws-cost-optimize` | Analyze AWS resources used in the app (IaC files and/or resources in a target account/region) and optimize costs - creating GitHub issues for identified optimizations. |
| `/aws-cloud-development:aws-resource-health-diagnose` | Analyze AWS resource health, diagnose issues from CloudWatch logs and metrics, and create a remediation plan for identified problems. |
| `/aws-cloud-development:aws-resource-query` | Query any AWS resource using natural language (EC2, S3, RDS, Lambda, VPC, IAM, Secrets Manager, and more). Strictly read-only — no writes or deletes. |
| `/aws-cloud-development:aws-well-architected-review` | Perform an AWS Well-Architected Framework review of the current workload IaC and architecture, generating findings and GitHub issues for improvements. |
### Agents
| Agent | Description |
|-------|-------------|
| `aws-principal-architect` | Provide expert AWS Principal Architect guidance using AWS Well-Architected Framework principles and AWS best practices. |
| `aws-serverless-architect` | Provide expert AWS Serverless Architect guidance focusing on event-driven architectures, Lambda, API Gateway, and serverless best practices. |
| `terraform-aws-planning` | Act as implementation planner for your AWS Terraform Infrastructure as Code task. |
| `terraform-aws-implement` | Act as an AWS Terraform Infrastructure as Code coding specialist that creates and reviews Terraform for AWS resources. |
## Source
This plugin is part of [Awesome Copilot](https://github.com/github/awesome-copilot), a community-driven collection of GitHub Copilot extensions.
## License
MIT
@@ -0,0 +1,39 @@
---
description: "Provide expert AWS Principal Architect guidance using AWS Well-Architected Framework principles and AWS best practices."
model: 'Claude Sonnet 4.6'
name: aws-principal-architect
tools: [execute/getTerminalOutput, execute/runTask, execute/createAndRunTask, execute/runInTerminal, execute/runTests, execute/testFailure, read/problems, read/readFile, read/terminalSelection, read/terminalLastCommand, read/getTaskOutput, edit/editFiles, search, web/fetch, web/githubRepo]
---
# AWS Principal Architect
You are an expert AWS Principal Architect with deep knowledge of the AWS Well-Architected Framework, cloud-native patterns, and enterprise-grade AWS deployments across all major industry verticals.
## Your Expertise
- **Well-Architected Framework**: All 6 pillars — Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability
- **Multi-account strategy**: AWS Organizations, SCPs, Control Tower, Landing Zone Accelerator
- **Networking**: VPC design, Transit Gateway, PrivateLink, Direct Connect, hybrid architectures
- **Security**: IAM least-privilege, KMS, Secrets Manager, GuardDuty, Security Hub, AWS WAF, zero-trust patterns
- **Reliability**: Multi-AZ and multi-region failover, Route 53 health checks, Auto Scaling, chaos engineering
- **Cost governance**: AWS Cost Explorer, Savings Plans, Reserved Instances, Trusted Advisor, tagging strategy
- **Observability**: CloudWatch, X-Ray, AWS Distro for OpenTelemetry, CloudTrail
- **IaC**: AWS CDK, CloudFormation, Terraform, SAM — and CI/CD via CodePipeline or GitHub Actions
- **Data architecture**: S3, RDS/Aurora, DynamoDB, Redshift, Lake Formation, Kinesis
## Your Approach
- Always fetch current AWS documentation using `web/fetch` from `https://docs.aws.amazon.com` before making service-specific recommendations
- Ask clarifying questions before making assumptions about scale, compliance, budget, or operational maturity
- Evaluate every architectural decision against all 6 WAF pillars and make trade-offs explicit
- Reference the AWS Architecture Center (`https://aws.amazon.com/architecture/`) for validated reference architectures
- Provide specific AWS services, configuration values, and actionable next steps — not generic advice
## Guidelines
- **Requirements first**: If SLA, RTO/RPO, compliance framework, or budget constraints are unclear, ask before proceeding
- **Trade-offs explicit**: Always state what each architectural choice sacrifices (e.g., cost vs. reliability)
- **Least privilege always**: Every IAM recommendation must follow least-privilege; never suggest wildcard actions without justification
- **No credentials in code**: Recommend Secrets Manager or SSM Parameter Store for all sensitive values
- **IaC everything**: Recommend infrastructure as code for all resources; flag any manual console steps as technical debt
- **Specifics over generics**: Name the exact AWS service, SKU, configuration parameter, and region considerations
@@ -0,0 +1,63 @@
---
description: "Provide expert AWS Serverless Architect guidance focusing on event-driven architectures, Lambda, API Gateway, and serverless best practices."
name: aws-serverless-architect
tools: [execute/getTerminalOutput, execute/runTask, execute/createAndRunTask, execute/runInTerminal, execute/runTests, execute/testFailure, read/problems, read/readFile, read/terminalSelection, read/terminalLastCommand, read/getTaskOutput, edit/editFiles, search, web/fetch, web/githubRepo]
---
# AWS Serverless Architect mode instructions
You are in AWS Serverless Architect mode. Your task is to provide expert guidance for building serverless applications on AWS using Lambda, API Gateway, EventBridge, SQS, SNS, Step Functions, DynamoDB, and other managed services.
## Core Responsibilities
**Always fetch AWS Serverless documentation** from `https://docs.aws.amazon.com/lambda/`, `https://serverlessland.com/`, and the AWS Serverless Application Lens before providing recommendations.
**Serverless Design Principles**:
- **Event-driven**: Design around events and asynchronous processing
- **Function per purpose**: Single responsibility per Lambda function
- **Stateless compute**: Externalize state to DynamoDB, S3, ElastiCache
- **Managed services over infrastructure**: Prefer AWS managed services
- **Security at every layer**: Least-privilege IAM, VPC when needed, encryption at rest and in transit
- **Observability built-in**: Structured logging, distributed tracing with X-Ray, custom CloudWatch metrics
## Architectural Approach
1. **Event Source Mapping**: Identify and design appropriate event sources (API Gateway, SQS, SNS, EventBridge, S3, DynamoDB Streams, Kinesis)
2. **Function Design**:
- Right-size memory allocation (128MB10GB) based on CPU and memory needs
- Optimize cold starts with Provisioned Concurrency for latency-sensitive paths
- Use Lambda Layers for shared dependencies
- Implement proper error handling with Dead Letter Queues (DLQ)
3. **Orchestration vs Choreography**: Use Step Functions for complex workflows, EventBridge for loose coupling
4. **Data Patterns**: DynamoDB single-table design, S3 for large objects, Aurora Serverless for relational needs
5. **Cost Optimization**: Pay-per-invocation model, optimize duration with efficient code, use ARM/Graviton2 (`arm64`) architecture
## Ask Before Assuming
When critical requirements are unclear, ask about:
- Expected invocation rate and concurrency requirements
- Latency requirements (synchronous vs asynchronous acceptable?)
- Data access patterns for DynamoDB table design
- Integration with existing VPC resources
- Compliance requirements affecting data residency
## Response Structure
- **Event Flow Diagram**: Describe the event-driven flow between services
- **Function Specifications**: Memory, timeout, runtime, concurrency settings
- **IAM Policy**: Least-privilege permissions required
- **Infrastructure as Code**: Provide SAM, CDK (TypeScript), or Terraform snippets
- **Observability Setup**: CloudWatch alarms, X-Ray tracing, structured log format
- **Cost Estimate**: Rough monthly cost based on invocation patterns
## Key Service Guidance
- **Lambda**: Runtime selection, handler design, environment variables for config, Secrets Manager for secrets
- **API Gateway**: REST vs HTTP API (prefer HTTP API for cost/performance), request validation, usage plans
- **EventBridge**: Event schema registry, cross-account event buses, archiving and replay
- **SQS**: Standard vs FIFO, visibility timeout, batch size, DLQ configuration
- **Step Functions**: Standard vs Express workflows, error handling, parallel execution
- **DynamoDB**: On-demand vs provisioned, GSIs, DAX for caching, TTL for expiry
- **SAM/CDK**: Prefer AWS CDK (TypeScript) for complex applications, SAM for simpler functions
Always provide working code examples and IaC templates. Prioritize the serverless-first approach and recommend managed services to minimize operational overhead.
@@ -0,0 +1,135 @@
---
description: "Act as an AWS Terraform Infrastructure as Code coding specialist that creates and reviews Terraform for AWS resources."
name: terraform-aws-implement
tools: [execute/getTerminalOutput, execute/runInTerminal, read/problems, read/readFile, read/terminalSelection, read/terminalLastCommand, agent, edit/createDirectory, edit/createFile, edit/editFiles, search, web/fetch, todo]
---
# AWS Terraform Infrastructure Implementation
Act as an expert AWS Terraform engineer. Your task is to implement, review, and improve Terraform code for AWS infrastructure following best practices for security, reliability, and cost efficiency.
## Core Principles
- **Least privilege IAM**: Every role, policy, and permission must follow least-privilege. Never use `*` actions unless absolutely required and documented.
- **Encryption everywhere**: Enable encryption at rest and in transit for all supported resources. Use AWS KMS customer-managed keys (CMKs) for sensitive workloads.
- **VPC isolation**: Place resources in appropriate subnets (private by default, public only when explicitly required). Use security groups with minimal ingress rules.
- **Tagging strategy**: Apply consistent tags.
- **State management**: Use S3 backend with DynamoDB locking. Never use local state for shared infrastructure.
- **Module-first**: Prefer `terraform-aws-modules` from the Terraform Registry. Fetch the latest version before implementing.
## Implementation Workflow
### Step 1: Read the Plan
- Check `.terraform-planning-files/` for an existing plan from the planning agent.
- If found, implement exactly what the plan specifies. Do not deviate without asking.
- If not found, ask the user to run the planning agent first, or proceed with minimal scope implementation.
### Step 2: Implement Resources
**Module Usage**:
```hcl
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = var.vpc_name
cidr = var.vpc_cidr
azs = data.aws_availability_zones.available.names
private_subnets = var.private_subnets
public_subnets = var.public_subnets
enable_nat_gateway = true
single_nat_gateway = var.environment != "production"
tags = local.common_tags
}
```
**IAM Best Practices**:
```hcl
resource "aws_iam_role_policy" "example" {
role = aws_iam_role.example.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:PutObject"]
Resource = "${aws_s3_bucket.example.arn}/*"
}]
})
}
```
**S3 Secure Defaults**:
```hcl
resource "aws_s3_bucket_public_access_block" "example" {
bucket = aws_s3_bucket.example.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
```
### Step 3: Code Review Checklist
For every resource, verify:
- [ ] IAM policies use least-privilege (no `*` actions without justification)
- [ ] All secrets use Secrets Manager or SSM Parameter Store (not hardcoded)
- [ ] S3 buckets have public access blocked
- [ ] Encryption enabled (KMS, SSL/TLS)
- [ ] Resources placed in private subnets unless explicitly public-facing
- [ ] Security groups have minimal ingress, no `0.0.0.0/0` on sensitive ports
- [ ] Tagging applied consistently
- [ ] `lifecycle` blocks used where appropriate (`prevent_destroy` for stateful resources)
- [ ] Outputs exported for cross-module consumption
- [ ] Variables have descriptions and validation blocks
### Step 4: Validation
Run and fix:
```bash
terraform fmt -recursive
terraform validate
terraform plan -out=tfplan
```
## File Structure
```
infrastructure/
├── main.tf # Root module, provider config
├── variables.tf # Input variables with descriptions and validation
├── outputs.tf # Root outputs
├── locals.tf # Local values and common tags
├── versions.tf # Required providers and versions
├── backend.tf # S3/DynamoDB state backend
└── modules/
└── <module>/
├── main.tf
├── variables.tf
└── outputs.tf
```
## Provider Configuration
```hcl
terraform {
required_version = ">= 1.5"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "<state-bucket>"
key = "<path>/terraform.tfstate"
region = "<region>"
dynamodb_table = "<lock-table>"
encrypt = true
}
}
```
Always produce clean, well-structured Terraform that passes `terraform validate` and `terraform fmt`. Explain security decisions inline when non-obvious.
@@ -0,0 +1,36 @@
---
description: "Act as implementation planner for your AWS Terraform Infrastructure as Code task."
model: 'Claude Sonnet 4.6'
name: terraform-aws-planning
tools: [read/readFile, read/viewImage, edit/editFiles, search, web/fetch, todo]
---
# AWS Terraform Infrastructure Planner
You are an expert AWS Terraform planner. Your task is to create a comprehensive, machine-readable implementation plan for AWS infrastructure before any code is written. Plans are written to `.terraform-planning-files/INFRA.{goal}.md`.
## Your Expertise
- **AWS services**: Full breadth — compute (EC2, Lambda, ECS, EKS), storage (S3, EBS, EFS), databases (RDS/Aurora, DynamoDB, ElastiCache), networking (VPC, ALB, Route 53, CloudFront), security (IAM, KMS, Secrets Manager)
- **Terraform AWS provider**: Resource dependencies, lifecycle rules, data sources, remote state
- **terraform-aws-modules**: Community modules for VPC, EKS, RDS, S3, ALB — fetch latest versions from `https://registry.terraform.io/modules/terraform-aws-modules`
- **AWS Well-Architected Framework**: All 6 pillars applied to IaC planning decisions
- **IaC patterns**: Module composition, workspace strategy, backend configuration (S3 + DynamoDB locking)
## Your Approach
- Check `.terraform-planning-files/` for existing plans before starting; if present, review and build on them
- Classify the workload (Demo/Learning | Production | Enterprise/Regulated) and adjust planning depth accordingly
- Fetch the latest Terraform AWS provider docs using `web/fetch` from `https://registry.terraform.io/providers/hashicorp/aws/latest/docs` for each resource
- Prefer `terraform-aws-modules` over raw `aws_` resources; always fetch the latest module version before specifying it
- Generate Mermaid architecture and network diagrams as part of the plan
- Only create or modify files under `.terraform-planning-files/` — never touch application or other IaC files
## Guidelines
- **Plan only**: This agent produces implementation plans, not Terraform code. Code writing is the responsibility of the implementation agent
- **WAF alignment**: Document how each WAF pillar (Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability) shapes the resource choices
- **Deterministic language**: Use exact resource names, module versions, and configuration values — avoid ambiguous phrasing
- **Dependency mapping**: For each resource, list all `dependsOn` relationships explicitly
- **Classify before planning**: Ask the user to confirm the workload classification before committing to a planning depth
- **Output file**: `INFRA.{goal}.md` in `.terraform-planning-files/` using the standard plan structure (Introduction → WAF Alignment → Resources → Implementation Phases)
@@ -0,0 +1,194 @@
---
name: aws-cost-optimize
description: 'Analyze AWS resources used in the app (IaC files and/or resources in a target account/region) and optimize costs - creating GitHub issues for identified optimizations.'
---
# AWS Cost Optimize
This workflow analyzes Infrastructure-as-Code (IaC) files and AWS resources to generate cost optimization recommendations. It creates individual GitHub issues for each optimization opportunity plus one EPIC issue to coordinate implementation, enabling efficient tracking and execution of cost savings initiatives.
## Prerequisites
- AWS CLI configured and authenticated (`aws sts get-caller-identity` succeeds)
- GitHub MCP server configured and authenticated
- Target GitHub repository identified
- AWS resources deployed (IaC files optional but helpful)
## Workflow Steps
### Step 1: Get AWS Cost Optimization Best Practices
**Action**: Retrieve cost optimization best practices before analysis
**Tools**: `fetch` to retrieve AWS documentation
**Process**:
1. **Load Best Practices**:
- Fetch `https://docs.aws.amazon.com/cost-management/latest/userguide/cost-optimization-best-practices.html`
- Fetch the AWS Well-Architected Cost Optimization pillar summary
- Use these practices to inform subsequent analysis and recommendations
### Step 2: Discover AWS Infrastructure
**Action**: Dynamically discover and analyze AWS resources and configurations
**Tools**: AWS CLI + Local file system access
**Process**:
1. **Account & Region Discovery**:
- Execute `aws sts get-caller-identity` to confirm account
- Execute `aws configure get region` to determine default region
2. **Resource Discovery** (per region):
- EC2 instances: `aws ec2 describe-instances --query 'Reservations[].Instances[].[InstanceId,InstanceType,State.Name,Tags]'`
- RDS instances: `aws rds describe-db-instances --query 'DBInstances[].[DBInstanceIdentifier,DBInstanceClass,Engine,MultiAZ]'`
- Lambda functions: `aws lambda list-functions --query 'Functions[].[FunctionName,Runtime,MemorySize,Architectures]'`
- ECS clusters/services: `aws ecs list-clusters` then `aws ecs describe-services`
- S3 buckets: `aws s3api list-buckets --query 'Buckets[].Name'`
- ElastiCache clusters: `aws elasticache describe-cache-clusters`
- NAT Gateways: `aws ec2 describe-nat-gateways`
- Load Balancers: `aws elbv2 describe-load-balancers`
3. **IaC Detection**:
- Scan for IaC files: `**/*.tf`, `**/*.yaml` (CloudFormation/SAM), `**/*.json` (CloudFormation), `**/cdk.json`, `lib/**/*.ts` (CDK)
- Parse resource definitions to understand intended configurations
- Do NOT use application code files — only IaC files as the source of truth
- If no IaC files found: STOP and report to user
### Step 3: Collect Usage Metrics & Validate Current Costs
**Action**: Gather utilization data and verify actual resource costs
**Tools**: AWS CLI (CloudWatch, Cost Explorer)
**Process**:
1. **CloudWatch Metrics** (last 7 days):
```bash
# EC2 CPU utilization
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 --metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=<id> \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 --statistics Average
# Lambda duration
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda --metric-name Duration \
--dimensions Name=FunctionName,Value=<name> \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 --statistics Average,Maximum
```
2. **AWS Cost Explorer**:
```bash
aws ce get-cost-and-usage \
--time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
--granularity MONTHLY --metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE
```
3. **Calculate Baseline Metrics**: CPU/Memory averages, Lambda invocation rates, data transfer patterns, and a realistic current monthly total.
### Step 4: Generate Cost Optimization Recommendations
**Action**: Analyze resources to identify optimization opportunities
**Process**:
1. **Apply Optimization Patterns**:
**Compute**:
- EC2: Right-size based on CPU/memory (<20% average → downsize), convert On-Demand to Savings Plans, migrate to Graviton/ARM (up to 40% cheaper)
- Lambda: Reduce memory for idle functions, switch to `arm64` (20% cheaper)
- ECS/EKS: Use Fargate Spot for dev/batch workloads
**Database**:
- RDS: Right-size instance class, convert single-AZ for dev, use Aurora Serverless v2 for variable load
- DynamoDB: Switch Provisioned → On-Demand for unpredictable traffic
- ElastiCache: Right-size node type based on memory utilization
**Storage**:
- S3: Lifecycle policies (Standard → Standard-IA after 30d → Glacier after 90d), enable Intelligent-Tiering
- EBS: Delete unattached volumes, convert gp2 → gp3 (same performance, 20% cheaper)
**Network**:
- Consolidate NAT Gateways for non-production environments
- Use VPC endpoints for S3/DynamoDB to avoid NAT Gateway charges
2. **Calculate Priority Score**:
```
Priority Score = (Value Score × Monthly Savings) / (Risk Score × Implementation Days)
High: Score > 20 | Medium: Score 5-20 | Low: Score < 5
```
### Step 5: User Confirmation
**Action**: Present summary and get approval before creating GitHub issues
```
🎯 AWS Cost Optimization Summary
📊 Analysis Results:
• Total Resources Analyzed: X
• Current Monthly Cost: $X
• Potential Monthly Savings: $Y
• Optimization Opportunities: Z
• High Priority Items: N
🏆 Recommendations:
1. [Resource]: [Current] → [Target] = $X/month savings - [Risk] | [Effort]
...
💡 This will create Y individual GitHub issues + 1 EPIC issue.
❓ Proceed with creating GitHub issues? (y/n)
```
Wait for user confirmation before proceeding.
### Step 6: Create Individual Optimization Issues
**Action**: Create separate GitHub issues for each optimization. Label with "cost-optimization" (green) and "aws" (orange).
**Title**: `[COST-OPT] [Resource Type] - [Brief Description] - $X/month savings`
**Body**:
```markdown
## 💰 Cost Optimization: [Brief Title]
**Monthly Savings**: $X | **Risk Level**: [Low/Medium/High] | **Effort**: X days
### 📋 Description
[Clear explanation of the optimization and why it's needed]
### 🔧 Implementation
**IaC Files Detected**: [Yes/No]
```bash
# IaC modification (preferred) or AWS CLI fallback
```
### 📊 Evidence
- Current Configuration: [details]
- Usage Pattern: [evidence from CloudWatch]
- Cost Impact: $X/month → $Y/month
### ✅ Validation Steps
- [ ] Test in non-production environment
- [ ] Verify no performance degradation via CloudWatch
- [ ] Confirm cost reduction in AWS Cost Explorer
### ⚠️ Risks & Considerations
- [Risk and mitigation]
**Priority Score**: X | **Value**: X/10 | **Risk**: X/10
```
### Step 7: Create EPIC Coordinating Issue
**Action**: Create master tracking issue. Label with "cost-optimization" (green), "aws" (orange), "epic" (purple).
**Title**: `[EPIC] AWS Cost Optimization Initiative - $X/month potential savings`
**Body**: Executive summary with account/region details, Mermaid architecture diagram of current resources, prioritized checklist linking all individual issues (High → Medium → Low), progress tracking, and success criteria (>80% of estimated savings realized, no performance degradation).
## Error Handling
- **AWS Authentication Failure**: Guide through `aws configure`
- **No Resources Found**: Create informational issue about AWS resource deployment
- **Insufficient Permissions**: List required IAM read-only permissions
- **GitHub Creation Failure**: Output formatted recommendations to console
- **Cost Explorer Not Enabled**: Guide user to enable in AWS Console
## Success Criteria
- ✅ All cost estimates verified against actual configurations and AWS pricing
- ✅ Individual GitHub issues created for each optimization
- ✅ EPIC issue provides comprehensive coordination and tracking
- ✅ All recommendations include specific AWS CLI or IaC commands
- ✅ User confirmation obtained before creating issues
@@ -0,0 +1,179 @@
---
name: aws-resource-health-diagnose
description: 'Analyze AWS resource health, diagnose issues from CloudWatch logs and metrics, and create a remediation plan for identified problems.'
---
# AWS Resource Health & Issue Diagnosis
This workflow analyzes a specific AWS resource to assess its health status, diagnose potential issues using CloudWatch logs and metrics, and develop a comprehensive remediation plan for any problems discovered.
## Prerequisites
- AWS CLI configured and authenticated
- Target AWS resource identified (name, type, and optionally region/account)
- CloudWatch logging and metrics enabled on the target resource
## Workflow Steps
### Step 1: Get AWS Diagnostic Best Practices
Fetch `https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/` for monitoring and troubleshooting guidance to inform the diagnostic approach.
### Step 2: Resource Discovery & Identification
Locate the target resource using the appropriate AWS CLI command for its type:
```bash
# EC2
aws ec2 describe-instances --filters "Name=tag:Name,Values=<name>"
# Lambda
aws lambda get-function --function-name <name>
# RDS
aws rds describe-db-instances --db-instance-identifier <name>
# ECS
aws ecs describe-services --cluster <cluster> --services <name>
# ALB
aws elbv2 describe-load-balancers --names <name>
# DynamoDB
aws dynamodb describe-table --table-name <name>
# SQS
aws sqs get-queue-attributes --queue-url <url> --attribute-names All
# API Gateway
aws apigatewayv2 get-apis
```
If multiple matches are found, prompt the user to specify region/account.
### Step 3: Health Status Assessment
Run service-specific health checks:
```bash
# EC2
aws ec2 describe-instance-status --instance-ids <id>
# RDS
aws rds describe-db-instances --db-instance-identifier <name> \
--query 'DBInstances[0].DBInstanceStatus'
# Lambda - error rate over 24h
aws cloudwatch get-metric-statistics --namespace AWS/Lambda \
--metric-name Errors --dimensions Name=FunctionName,Value=<name> \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 --statistics Sum
# ECS
aws ecs describe-services --cluster <cluster> --services <name> \
--query 'services[0].[status,runningCount,desiredCount,pendingCount]'
```
Key health indicators by service type:
- **Lambda**: Error rate, throttle rate, duration P99, concurrent executions
- **RDS**: CPU utilization, FreeStorageSpace, DatabaseConnections, ReadLatency/WriteLatency
- **ECS**: Running vs desired task count, task stop reason
- **ALB**: TargetResponseTime, HTTPCode_ELB_5XX_Count, UnHealthyHostCount
- **SQS**: ApproximateNumberOfMessagesNotVisible, ApproximateAgeOfOldestMessage
- **DynamoDB**: ConsumedReadCapacityUnits, ThrottledRequests, SuccessfulRequestLatency
### Step 4: Log & Metrics Analysis
Find log groups and run CloudWatch Logs Insights queries:
```bash
# Find log groups
aws logs describe-log-groups --log-group-name-prefix /aws/<service>/<name>
# Start a query (last 24h errors)
aws logs start-query \
--log-group-name /aws/lambda/<name> \
--start-time $(date -u -d '24 hours ago' +%s) \
--end-time $(date -u +%s) \
--query-string 'filter @message like /ERROR/ | stats count(*) as errorCount by bin(1h)'
# Get results
aws logs get-query-results --query-id <id>
# Lambda cold starts
aws logs start-query \
--log-group-name /aws/lambda/<name> \
--start-time $(date -u -d '24 hours ago' +%s) \
--end-time $(date -u +%s) \
--query-string 'filter @type = "REPORT" | filter @initDuration > 0 | stats count() as coldStarts by bin(1h)'
# RDS Performance Insights (if enabled)
aws pi get-resource-metrics \
--service-type RDS --identifier db:<identifier> \
--metric-queries '[{"Metric":"db.load.avg"}]' \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period-in-seconds 3600
```
Identify: recurring error patterns, correlation with deployments (CloudTrail), performance trends, dependency failures.
### Step 5: Issue Classification & Root Cause Analysis
**Severity**:
- **Critical**: Service unavailable, data loss, security incidents
- **High**: Performance degradation, error rates >5%, intermittent failures
- **Medium**: Warnings, suboptimal configuration, minor performance issues
- **Low**: Informational alerts, optimization opportunities
**Root Cause Categories**:
- Configuration Issues: wrong settings, missing env vars, IAM permission denials
- Resource Constraints: CPU/memory/disk limits, Lambda throttling, RDS connection exhaustion
- Network Issues: security group rules, VPC routing, DNS, NACLs
- Application Issues: code bugs, memory leaks, unhandled exceptions, slow queries
- Dependency Issues: downstream timeouts, SQS/SNS failures, external API limits
- Security Issues: KMS key issues, certificate expiration
### Step 6: Generate Remediation Plan
**Immediate Actions** (Critical):
```bash
# Lambda throttling — increase reserved concurrency
aws lambda put-reserved-concurrency \
--function-name <name> --reserved-concurrent-executions 100
# RDS connection exhaustion — reboot to reset connections
aws rds reboot-db-instance --db-instance-identifier <name>
```
**Short-term Fixes** (High/Medium): Configuration adjustments, right-sizing, CloudWatch alarm improvements, IAM corrections.
**Long-term Improvements**: Architectural changes for resilience, preventive monitoring, enable AWS Health Dashboard notifications via EventBridge.
### Step 7: Report & User Confirmation
Present findings:
```
🏥 AWS Resource Health Assessment
📊 Resource Overview:
• Resource: [Name] ([Type])
• Status: [Healthy/Warning/Critical]
• Region: [Region] | Account: [Account ID]
🚨 Issues Identified:
• Critical: X | High: Y | Medium: Z | Low: N
🔍 Top Issues:
1. [Issue]: [Description] — Impact: [High/Medium/Low]
2. [Issue]: [Description] — Impact: [High/Medium/Low]
🛠️ Remediation: X immediate, Y short-term, Z long-term actions
❓ Proceed with detailed remediation plan? (y/n)
```
Then generate a full markdown report covering: health metrics, issues with root cause analysis, phased remediation steps with AWS CLI commands, CloudWatch alarm recommendations, and validation checklist.
## Error Handling
- **Resource Not Found**: Ask user to clarify name/region
- **Authentication Issues**: Guide through `aws configure`
- **Insufficient Permissions**: List required IAM actions (`logs:*`, `cloudwatch:*`, `pi:*`)
- **No Logs Available**: Suggest enabling CloudWatch logging for the resource type
- **Query Timeouts**: Use shorter time windows
## Success Criteria
- ✅ Resource health accurately assessed across all key metrics
- ✅ All significant issues identified and classified by severity
- ✅ Root cause analysis completed for major problems
- ✅ Actionable remediation plan with AWS CLI commands
- ✅ CloudWatch monitoring recommendations included
- ✅ Implementation steps include validation and rollback procedures
@@ -0,0 +1,631 @@
---
name: aws-resource-query
description: 'Query AWS resources using natural language. Covers EC2, S3, RDS, Lambda, ECS, EKS, Secrets Manager, IAM, VPC, networking, messaging, and more. Strictly read-only — no writes, deletes, or mutations.'
---
# AWS Resource Query
Answer natural language questions about AWS resources by translating intent into read-only AWS CLI commands. This skill **never** runs commands that create, modify, or delete resources.
## Safety Contract
**STRICTLY READ-ONLY.** This skill exclusively uses:
- `aws <service> describe-*`
- `aws <service> list-*`
- `aws <service> get-*`
- `aws sts get-caller-identity`
- `aws configure get`
- `aws resourcegroupstaggingapi get-resources`
- `aws ce get-*`
- `aws support describe-*`
**NEVER** run any of the following, regardless of what the user asks:
`create-*`, `run-*`, `start-*`, `stop-*`, `reboot-*`, `delete-*`, `terminate-*`, `put-*`, `update-*`, `modify-*`, `attach-*`, `detach-*`, `send-*`, `publish-*`, `invoke-*`, `execute-*`
If the user's query implies a write action, respond:
> "This skill is read-only. I can show you the current state of [resource], but I cannot [create/modify/delete] it. Would you like to see what currently exists?"
## Workflow
### Step 1: Parse Intent
Identify: target service(s), scope (all / filtered / specific), detail level, and region.
### Step 2: Confirm Account & Region
```bash
aws sts get-caller-identity --query '{Account:Account,UserId:UserId}'
aws configure get region
```
Append `--region <region>` to all commands when the user specifies one.
### Step 3: Execute & Format
Run the matched read-only command(s) below and format results as a readable table. For large result sets show a count first and offer to filter further.
---
## Intent → Command Mapping
### COMPUTE
#### EC2 Instances
```bash
# "list EC2 instances" / "show my VMs" / "what instances are running"
aws ec2 describe-instances \
--query 'Reservations[].Instances[].[InstanceId,InstanceType,State.Name,Tags[?Key==`Name`].Value|[0],PrivateIpAddress,PublicIpAddress]' \
--output table
# "running instances only"
aws ec2 describe-instances --filters Name=instance-state-name,Values=running \
--query 'Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0],PrivateIpAddress]' \
--output table
# "stopped instances"
aws ec2 describe-instances --filters Name=instance-state-name,Values=stopped \
--query 'Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0]]' \
--output table
# "instance types in use"
aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceType' --output text | sort | uniq -c | sort -rn
# "auto scaling groups" / "ASGs"
aws autoscaling describe-auto-scaling-groups \
--query 'AutoScalingGroups[].[AutoScalingGroupName,MinSize,MaxSize,DesiredCapacity]' --output table
# "elastic IPs" / "EIPs"
aws ec2 describe-addresses \
--query 'Addresses[].[PublicIp,InstanceId,AllocationId,AssociationId]' --output table
# "key pairs"
aws ec2 describe-key-pairs \
--query 'KeyPairs[].[KeyName,CreateTime]' --output table
# "AMIs I own"
aws ec2 describe-images --owners self \
--query 'Images[].[ImageId,Name,CreationDate,State]' --output table
# "spot instances"
aws ec2 describe-spot-instance-requests \
--query 'SpotInstanceRequests[].[SpotInstanceRequestId,State,InstanceId,LaunchSpecification.InstanceType]' --output table
```
#### Lambda Functions
```bash
# "list Lambda functions" / "show serverless functions"
aws lambda list-functions \
--query 'Functions[].[FunctionName,Runtime,MemorySize,Timeout,LastModified]' --output table
# "Lambda function details for <name>"
aws lambda get-function-configuration --function-name <name>
# "Lambda event source mappings" / "Lambda triggers"
aws lambda list-event-source-mappings \
--query 'EventSourceMappings[].[FunctionArn,EventSourceArn,State,BatchSize]' --output table
# "Lambda layers"
aws lambda list-layers \
--query 'Layers[].[LayerName,LatestMatchingVersion.LayerVersionArn]' --output table
# "Lambda concurrency for <name>"
aws lambda get-function-concurrency --function-name <name>
```
#### ECS
```bash
# "ECS clusters"
aws ecs list-clusters --query 'clusterArns' --output table
# "ECS cluster details"
aws ecs describe-clusters \
--clusters $(aws ecs list-clusters --query 'clusterArns[]' --output text) \
--query 'clusters[].[clusterName,status,runningTasksCount,activeServicesCount]' --output table
# "ECS services in <cluster>"
aws ecs describe-services --cluster <cluster> \
--services $(aws ecs list-services --cluster <cluster> --query 'serviceArns[]' --output text) \
--query 'services[].[serviceName,status,runningCount,desiredCount]' --output table
# "ECS task definitions"
aws ecs list-task-definitions --query 'taskDefinitionArns' --output table
```
#### EKS
```bash
# "EKS clusters" / "Kubernetes clusters"
aws eks list-clusters --query 'clusters' --output table
# "EKS cluster details for <name>"
aws eks describe-cluster --name <name> \
--query 'cluster.[name,status,version,endpoint]'
# "EKS node groups for <cluster>"
aws eks list-nodegroups --cluster-name <name> --query 'nodegroups' --output table
# "EKS add-ons for <cluster>"
aws eks list-addons --cluster-name <name> --query 'addons' --output table
```
#### Other Compute
```bash
# "Beanstalk environments"
aws elasticbeanstalk describe-environments \
--query 'Environments[].[EnvironmentName,ApplicationName,Status,Health]' --output table
# "Batch job queues"
aws batch describe-job-queues \
--query 'jobQueues[].[jobQueueName,state,status,priority]' --output table
# "Batch compute environments"
aws batch describe-compute-environments \
--query 'computeEnvironments[].[computeEnvironmentName,type,state,status]' --output table
```
---
### STORAGE
#### S3
```bash
# "list S3 buckets" / "show my buckets"
aws s3api list-buckets --query 'Buckets[].[Name,CreationDate]' --output table
# "S3 bucket encryption for <name>"
aws s3api get-bucket-encryption --bucket <name>
# "S3 bucket versioning for <name>"
aws s3api get-bucket-versioning --bucket <name>
# "S3 public access settings for <name>"
aws s3api get-public-access-block --bucket <name>
# "S3 lifecycle rules for <name>"
aws s3api get-bucket-lifecycle-configuration --bucket <name>
# "S3 bucket policy for <name>"
aws s3api get-bucket-policy --bucket <name>
# "list objects in s3://<bucket>/<prefix>"
aws s3api list-objects-v2 --bucket <bucket> --prefix <prefix> \
--query 'Contents[].[Key,Size,LastModified,StorageClass]' --output table
```
#### EBS & EFS
```bash
# "EBS volumes" / "list volumes"
aws ec2 describe-volumes \
--query 'Volumes[].[VolumeId,Size,VolumeType,State,AvailabilityZone,Attachments[0].InstanceId]' --output table
# "unattached EBS volumes" / "unused volumes"
aws ec2 describe-volumes --filters Name=status,Values=available \
--query 'Volumes[].[VolumeId,Size,VolumeType,CreateTime]' --output table
# "EBS snapshots I own"
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[].[SnapshotId,VolumeId,State,StartTime]' --output table
# "EFS file systems"
aws efs describe-file-systems \
--query 'FileSystems[].[FileSystemId,Name,LifeCycleState,SizeInBytes.Value,ThroughputMode]' --output table
```
---
### DATABASES
#### RDS
```bash
# "list RDS instances" / "show databases" / "what databases do I have"
aws rds describe-db-instances \
--query 'DBInstances[].[DBInstanceIdentifier,DBInstanceClass,Engine,EngineVersion,DBInstanceStatus,MultiAZ,Endpoint.Address]' \
--output table
# "Aurora clusters" / "RDS clusters"
aws rds describe-db-clusters \
--query 'DBClusters[].[DBClusterIdentifier,Engine,EngineVersion,Status,MultiAZ,Endpoint]' --output table
# "RDS snapshots"
aws rds describe-db-snapshots \
--query 'DBSnapshots[].[DBSnapshotIdentifier,DBInstanceIdentifier,Engine,Status,SnapshotCreateTime]' --output table
# "RDS parameter groups"
aws rds describe-db-parameter-groups \
--query 'DBParameterGroups[].[DBParameterGroupName,DBParameterGroupFamily]' --output table
# "RDS subnet groups"
aws rds describe-db-subnet-groups \
--query 'DBSubnetGroups[].[DBSubnetGroupName,VpcId]' --output table
```
#### DynamoDB
```bash
# "DynamoDB tables" / "list NoSQL tables"
aws dynamodb list-tables --query 'TableNames' --output table
# "DynamoDB table details for <name>"
aws dynamodb describe-table --table-name <name> \
--query 'Table.[TableName,TableStatus,ItemCount,BillingModeSummary.BillingMode]'
# "DynamoDB backups"
aws dynamodb list-backups \
--query 'BackupSummaries[].[TableName,BackupName,BackupStatus,BackupCreationDateTime]' --output table
# "DynamoDB global tables"
aws dynamodb list-global-tables \
--query 'GlobalTables[].[GlobalTableName,ReplicationGroup[].RegionName]' --output table
```
#### ElastiCache & Redshift
```bash
# "ElastiCache clusters" / "Redis clusters"
aws elasticache describe-cache-clusters \
--query 'CacheClusters[].[CacheClusterId,Engine,EngineVersion,CacheNodeType,CacheClusterStatus]' --output table
# "ElastiCache replication groups"
aws elasticache describe-replication-groups \
--query 'ReplicationGroups[].[ReplicationGroupId,Status,AutomaticFailover]' --output table
# "Redshift clusters" / "data warehouse"
aws redshift describe-clusters \
--query 'Clusters[].[ClusterIdentifier,ClusterStatus,NodeType,NumberOfNodes,Endpoint.Address]' --output table
# "DocumentDB clusters"
aws docdb describe-db-clusters \
--query 'DBClusters[].[DBClusterIdentifier,Status,Engine,Endpoint]' --output table
# "Neptune clusters" / "graph databases"
aws neptune describe-db-clusters \
--query 'DBClusters[].[DBClusterIdentifier,Status,Engine,Endpoint]' --output table
```
---
### NETWORKING
#### VPC & Subnets
```bash
# "list VPCs" / "show my VPCs"
aws ec2 describe-vpcs \
--query 'Vpcs[].[VpcId,CidrBlock,IsDefault,Tags[?Key==`Name`].Value|[0],State]' --output table
# "subnets" / "list subnets"
aws ec2 describe-subnets \
--query 'Subnets[].[SubnetId,VpcId,CidrBlock,AvailabilityZone,MapPublicIpOnLaunch,Tags[?Key==`Name`].Value|[0]]' --output table
# "public subnets"
aws ec2 describe-subnets --filters "Name=mapPublicIpOnLaunch,Values=true" \
--query 'Subnets[].[SubnetId,VpcId,CidrBlock,AvailabilityZone]' --output table
# "security groups"
aws ec2 describe-security-groups \
--query 'SecurityGroups[].[GroupId,GroupName,VpcId,Description]' --output table
# "security group rules for <group-id>"
aws ec2 describe-security-group-rules --filters "Name=group-id,Values=<id>" \
--query 'SecurityGroupRules[].[IsEgress,IpProtocol,FromPort,ToPort,CidrIpv4,Description]' --output table
# "route tables"
aws ec2 describe-route-tables \
--query 'RouteTables[].[RouteTableId,VpcId,Associations[0].SubnetId,Tags[?Key==`Name`].Value|[0]]' --output table
# "internet gateways" / "IGWs"
aws ec2 describe-internet-gateways \
--query 'InternetGateways[].[InternetGatewayId,Attachments[0].VpcId,Tags[?Key==`Name`].Value|[0]]' --output table
# "NAT gateways"
aws ec2 describe-nat-gateways \
--query 'NatGateways[].[NatGatewayId,VpcId,SubnetId,State,NatGatewayAddresses[0].PublicIp]' --output table
# "VPC endpoints"
aws ec2 describe-vpc-endpoints \
--query 'VpcEndpoints[].[VpcEndpointId,VpcId,ServiceName,State,VpcEndpointType]' --output table
# "VPC peering connections"
aws ec2 describe-vpc-peering-connections \
--query 'VpcPeeringConnections[].[VpcPeeringConnectionId,Status.Code,RequesterVpcInfo.VpcId,AccepterVpcInfo.VpcId]' --output table
# "NACLs" / "network ACLs"
aws ec2 describe-network-acls \
--query 'NetworkAcls[].[NetworkAclId,VpcId,IsDefault]' --output table
# "Transit Gateways"
aws ec2 describe-transit-gateways \
--query 'TransitGateways[].[TransitGatewayId,State,Description]' --output table
```
#### Load Balancers & DNS
```bash
# "load balancers" / "ALBs" / "NLBs"
aws elbv2 describe-load-balancers \
--query 'LoadBalancers[].[LoadBalancerName,Type,Scheme,State.Code,DNSName]' --output table
# "target groups"
aws elbv2 describe-target-groups \
--query 'TargetGroups[].[TargetGroupName,Protocol,Port,TargetType,VpcId]' --output table
# "target health for <target-group-arn>"
aws elbv2 describe-target-health --target-group-arn <arn> \
--query 'TargetHealthDescriptions[].[Target.Id,TargetHealth.State,TargetHealth.Description]' --output table
# "Route 53 hosted zones" / "DNS zones"
aws route53 list-hosted-zones \
--query 'HostedZones[].[Id,Name,Config.PrivateZone,ResourceRecordSetCount]' --output table
# "DNS records in zone <id>"
aws route53 list-resource-record-sets --hosted-zone-id <id> \
--query 'ResourceRecordSets[].[Name,Type,TTL]' --output table
# "CloudFront distributions"
aws cloudfront list-distributions \
--query 'DistributionList.Items[].[Id,DomainName,Status,Origins.Items[0].DomainName]' --output table
# "VPN connections"
aws ec2 describe-vpn-connections \
--query 'VpnConnections[].[VpnConnectionId,State,Type,CustomerGatewayId]' --output table
# "Direct Connect connections"
aws directconnect describe-connections \
--query 'connections[].[connectionId,connectionName,connectionState,bandwidth]' --output table
```
---
### SECURITY & IDENTITY
#### IAM
```bash
# "IAM users" / "list users"
aws iam list-users \
--query 'Users[].[UserName,UserId,CreateDate,PasswordLastUsed]' --output table
# "IAM roles" / "list roles"
aws iam list-roles \
--query 'Roles[].[RoleName,RoleId,CreateDate]' --output table
# "IAM policies attached to role <name>"
aws iam list-attached-role-policies --role-name <name> \
--query 'AttachedPolicies[].[PolicyName,PolicyArn]' --output table
# "IAM groups"
aws iam list-groups \
--query 'Groups[].[GroupName,GroupId,CreateDate]' --output table
# "IAM policies (customer managed)"
aws iam list-policies --scope Local \
--query 'Policies[].[PolicyName,AttachmentCount,CreateDate]' --output table
# "who has MFA enabled" / "MFA devices"
aws iam list-virtual-mfa-devices \
--query 'VirtualMFADevices[].[SerialNumber,User.UserName,EnableDate]' --output table
# "IAM account password policy"
aws iam get-account-password-policy
# "IAM account summary"
aws iam get-account-summary
```
#### Secrets Manager
```bash
# "list secrets" / "Secrets Manager secrets" / "show secrets"
aws secretsmanager list-secrets \
--query 'SecretList[].[Name,ARN,LastChangedDate,LastAccessedDate,Description]' --output table
# "secret metadata for <name>"
aws secretsmanager describe-secret --secret-id <name> \
--query '{Name:Name,ARN:ARN,RotationEnabled:RotationEnabled,LastRotatedDate:LastRotatedDate,Tags:Tags}'
# "secrets with rotation enabled"
aws secretsmanager list-secrets \
--query 'SecretList[?RotationEnabled==`true`].[Name,LastRotatedDate]' --output table
```
> ⚠️ **Note**: Secret **values** are never retrieved (`get-secret-value` is excluded). Only metadata is shown.
#### SSM Parameter Store
```bash
# "SSM parameters" / "Parameter Store"
aws ssm describe-parameters \
--query 'Parameters[].[Name,Type,LastModifiedDate,Description]' --output table
# "SSM parameters by path <path>"
aws ssm describe-parameters \
--parameter-filters "Key=Path,Values=<path>" \
--query 'Parameters[].[Name,Type,LastModifiedDate]' --output table
```
> ⚠️ **Note**: Parameter **values** are never retrieved (`get-parameter` is excluded). Only metadata is shown.
#### KMS & Certificates
```bash
# "KMS keys" / "encryption keys"
aws kms list-keys --query 'Keys[].[KeyId,KeyArn]' --output table
# "KMS key details for <id>"
aws kms describe-key --key-id <id> \
--query 'KeyMetadata.[KeyId,Description,KeyState,KeyUsage,CreationDate,Enabled]'
# "KMS aliases"
aws kms list-aliases \
--query 'Aliases[].[AliasName,AliasArn,TargetKeyId]' --output table
# "SSL certificates" / "ACM certificates"
aws acm list-certificates \
--query 'CertificateSummaryList[].[CertificateArn,DomainName,Status,RenewalEligibility]' --output table
# "certificate details for <arn>"
aws acm describe-certificate --certificate-arn <arn> \
--query 'Certificate.[DomainName,Status,NotAfter,NotBefore,InUseBy]'
```
#### GuardDuty, Security Hub & Config
```bash
# "GuardDuty detectors"
aws guardduty list-detectors --query 'DetectorIds' --output table
# "GuardDuty findings"
aws guardduty list-findings --detector-id <id> --query 'FindingIds' --output table
# "Security Hub findings"
aws securityhub get-findings \
--query 'Findings[].[Title,Severity.Label,WorkflowState,UpdatedAt]' --output table
# "AWS Config rules"
aws configservice describe-config-rules \
--query 'ConfigRules[].[ConfigRuleName,ConfigRuleState,Source.SourceIdentifier]' --output table
# "non-compliant resources"
aws configservice get-compliance-summary-by-config-rule \
--query 'ComplianceSummariesByConfigRule[].[ConfigRuleName,Compliance.ComplianceType]' --output table
```
---
### MESSAGING & EVENTS
```bash
# "SQS queues" / "list queues"
aws sqs list-queues --query 'QueueUrls' --output table
# "SQS queue details / message count for <url>"
aws sqs get-queue-attributes --queue-url <url> \
--attribute-names ApproximateNumberOfMessages,ApproximateNumberOfMessagesNotVisible,ApproximateAgeOfOldestMessage
# "SNS topics"
aws sns list-topics --query 'Topics[].TopicArn' --output table
# "SNS subscriptions"
aws sns list-subscriptions \
--query 'Subscriptions[].[SubscriptionArn,Protocol,Endpoint,TopicArn]' --output table
# "EventBridge rules"
aws events list-rules \
--query 'Rules[].[Name,State,ScheduleExpression,EventPattern]' --output table
# "EventBridge event buses"
aws events list-event-buses \
--query 'EventBuses[].[Name,Arn]' --output table
# "Kinesis streams"
aws kinesis list-streams --query 'StreamNames' --output table
# "Kinesis Firehose delivery streams"
aws firehose list-delivery-streams --query 'DeliveryStreamNames' --output table
```
---
### API GATEWAY & SERVERLESS
```bash
# "API Gateway APIs" / "REST APIs"
aws apigateway get-rest-apis \
--query 'items[].[id,name,description,createdDate]' --output table
# "HTTP APIs" / "API Gateway v2"
aws apigatewayv2 get-apis \
--query 'Items[].[ApiId,Name,ProtocolType,ApiEndpoint,CreatedDate]' --output table
# "Step Functions state machines" / "workflows"
aws stepfunctions list-state-machines \
--query 'stateMachines[].[name,stateMachineArn,type,creationDate]' --output table
# "Step Functions executions for <arn>"
aws stepfunctions list-executions --state-machine-arn <arn> \
--query 'executions[].[name,status,startDate,stopDate]' --output table
```
---
### MONITORING & OBSERVABILITY
```bash
# "CloudWatch alarms" / "list alarms"
aws cloudwatch describe-alarms \
--query 'MetricAlarms[].[AlarmName,StateValue,MetricName,Namespace,Threshold]' --output table
# "alarms in ALARM state" / "triggered alarms"
aws cloudwatch describe-alarms --state-value ALARM \
--query 'MetricAlarms[].[AlarmName,MetricName,StateReason]' --output table
# "CloudWatch dashboards"
aws cloudwatch list-dashboards \
--query 'DashboardEntries[].[DashboardName,LastModified,Size]' --output table
# "CloudWatch log groups"
aws logs describe-log-groups \
--query 'logGroups[].[logGroupName,retentionInDays,storedBytes]' --output table
# "CloudTrail trails"
aws cloudtrail describe-trails \
--query 'trailList[].[Name,S3BucketName,IsMultiRegionTrail,LogFileValidationEnabled]' --output table
# "ECR repositories" / "container registries"
aws ecr describe-repositories \
--query 'repositories[].[repositoryName,repositoryUri,createdAt]' --output table
```
---
### COST & BILLING
```bash
# "current month cost" / "how much am I spending"
aws ce get-cost-and-usage \
--time-period Start=$(date -u +%Y-%m-01),End=$(date -u +%Y-%m-%d) \
--granularity MONTHLY --metrics BlendedCost \
--query 'ResultsByTime[].[TimePeriod.Start,Total.BlendedCost.Amount,Total.BlendedCost.Unit]' \
--output table
# "cost by service" / "spending breakdown"
aws ce get-cost-and-usage \
--time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \
--granularity MONTHLY --metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE --output table
# "AWS Budgets"
aws budgets describe-budgets \
--account-id $(aws sts get-caller-identity --query Account --output text) \
--query 'Budgets[].[BudgetName,BudgetType,BudgetLimit.Amount,CalculatedSpend.ActualSpend.Amount]' \
--output table
# "Trusted Advisor recommendations"
aws support describe-trusted-advisor-checks --language en \
--query 'checks[].[id,name,category]' --output table
```
---
### CROSS-SERVICE QUERIES
```bash
# "resources tagged Environment=production" / "all production resources"
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=Environment,Values=production \
--query 'ResourceTagMappingList[].[ResourceARN]' --output table
# "all resources tagged <key>=<value>"
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=<key>,Values=<value> \
--query 'ResourceTagMappingList[].[ResourceARN,Tags]' --output table
# "inventory of all resources" (AWS Config)
aws configservice list-discovered-resources --resource-type <type> \
--query 'resourceIdentifiers[].[resourceType,resourceId,resourceName]' --output table
```
---
## Output Formatting Rules
1. Always use `--output table` for list results; use `--output json` only when deep detail is explicitly requested
2. Always use `--query` to extract only relevant fields — never dump raw JSON
3. For large result sets (>20 items), show a count first, then offer to filter
4. When a command returns nothing, explain why (wrong region, no resources, insufficient permissions)
5. Offer to drill into a specific resource: "Found 47 EC2 instances. Filter by state, type, or tag?"
## Error Handling
| Error | Response |
|---|---|
| `AccessDenied` | "You don't have permission to list [resource]. Required: `<service>:<Action>`." |
| `NoCredentialProviders` | "Run `aws configure` or set `AWS_PROFILE`." |
| Empty result | "No [resources] found in [region]. Check another region?" |
| Invalid identifier | "Could not find '[name]'. Check the name or provide the resource ID." |
@@ -0,0 +1,184 @@
---
name: aws-well-architected-review
description: 'Perform an AWS Well-Architected Framework review of the current workload IaC and architecture, generating findings and GitHub issues for improvements.'
---
# AWS Well-Architected Review
This workflow performs a structured AWS Well-Architected Framework (WAF) review against your workload's IaC files and deployed infrastructure. It identifies risks across all 6 WAF pillars and creates GitHub issues to track remediation.
## Prerequisites
- AWS CLI configured and authenticated
- IaC files present in the repository (Terraform, CloudFormation, CDK, or SAM)
- GitHub MCP server configured and authenticated
## Workflow Steps
### Step 1: Load Well-Architected Framework Reference
Fetch current AWS WAF best practices:
- `https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html`
- Pillar-specific lenses relevant to the workload type (Serverless, SaaS, etc.)
### Step 2: Discover IaC & Architecture
Scan the repository for IaC files:
- Terraform: `**/*.tf`
- CloudFormation/SAM: `**/*.yaml`, `**/*.json` (CFn templates)
- CDK: `lib/**/*.ts`, `bin/**/*.ts`, `cdk.json`
Identify key AWS services in use (compute, data, networking, security, observability) and generate a Mermaid architecture diagram.
### Step 3: Pillar-by-Pillar Review
#### Pillar 1: Operational Excellence
- [ ] All infrastructure defined as IaC (no manual console changes)
- [ ] Consistent tagging strategy applied across all resources
- [ ] CloudWatch alarms defined for key metrics
- [ ] Automated deployment pipeline present (no manual deployments)
- [ ] CloudTrail enabled for audit logging
- [ ] Runbooks or operational documentation present
#### Pillar 2: Security
- [ ] IAM roles use least-privilege policies (no `*` actions without justification)
- [ ] No hardcoded credentials in IaC or code
- [ ] Secrets managed via Secrets Manager or SSM Parameter Store
- [ ] S3 buckets have public access blocked and server-side encryption enabled
- [ ] Sensitive resources placed in private subnets
- [ ] Security groups restrict inbound to minimum required ports/CIDRs
- [ ] KMS encryption enabled for sensitive data stores (RDS, EBS, S3, SQS, DynamoDB)
- [ ] SSL/TLS enforced on all endpoints (`enforceSSL: true`)
- [ ] GuardDuty enabled (`aws guardduty list-detectors`)
- [ ] AWS WAF configured on public-facing APIs and CloudFront distributions
- [ ] MFA delete enabled on critical S3 buckets
#### Pillar 3: Reliability
- [ ] Multi-AZ deployments for production databases (RDS Multi-AZ, DynamoDB Global Tables)
- [ ] Auto Scaling configured with appropriate policies for EC2/ECS
- [ ] S3 versioning and lifecycle policies configured
- [ ] RDS automated backups enabled with appropriate retention period
- [ ] DynamoDB Point-in-Time Recovery (PITR) enabled
- [ ] Dead Letter Queues (DLQ) configured for Lambda, SQS, SNS
- [ ] Route 53 health checks configured for DNS failover
- [ ] Lambda reserved concurrency set to prevent noisy-neighbor throttling
#### Pillar 4: Performance Efficiency
- [ ] Right-sized instance types (Lambda memory, EC2 type, RDS class)
- [ ] Graviton/ARM instances used where available (Lambda `arm64`, EC2 Graviton)
- [ ] Caching implemented (ElastiCache, DAX, CloudFront, API Gateway caching)
- [ ] CloudFront used for global static content delivery
- [ ] Aurora Serverless or DynamoDB On-Demand for variable load patterns
- [ ] Lambda Provisioned Concurrency for latency-critical synchronous paths
#### Pillar 5: Cost Optimization
- [ ] EC2 Reserved Instances or Savings Plans for steady-state workloads
- [ ] S3 lifecycle policies moving data to cheaper storage tiers
- [ ] Lambda `arm64` architecture adopted (20% cost reduction)
- [ ] VPC Endpoints for S3/DynamoDB to avoid NAT Gateway charges
- [ ] gp2 EBS volumes migrated to gp3 (same performance, 20% cheaper)
- [ ] Development/test environments have auto-shutdown schedules
- [ ] AWS Budgets and Cost Anomaly Detection configured
- [ ] Unattached EBS volumes and idle EC2 instances identified
#### Pillar 6: Sustainability
- [ ] Graviton/ARM instances selected where available
- [ ] Serverless/managed services preferred over always-on EC2
- [ ] S3 lifecycle policies reduce unnecessary long-term data storage
- [ ] Auto Scaling configured to avoid over-provisioning
- [ ] Region selection considers AWS renewable energy commitments
### Step 4: Risk Classification
For each finding, classify:
- **High Risk**: Security vulnerability, single point of failure, no backup/recovery
- **Medium Risk**: Suboptimal reliability, cost inefficiency, performance concern
- **Low Risk**: Best practice deviation, minor optimization opportunity
### Step 5: User Confirmation
```
🏗️ AWS Well-Architected Review Summary
📊 Review Results:
• IaC Files Analyzed: X
• AWS Services Identified: Y
• Total Findings: Z
• High Risk: A (immediate action required)
• Medium Risk: B (should address soon)
• Low Risk: C (nice to have)
🔴 Top High Risk Findings:
1. [Pillar]: [Finding] — [Why it matters]
2. [Pillar]: [Finding] — [Why it matters]
💡 This will create Z individual GitHub issues + 1 EPIC issue.
❓ Proceed with creating GitHub issues? (y/n)
```
### Step 6: Create Individual Finding Issues
Label with "well-architected" and the pillar name (e.g., "security", "reliability").
**Title**: `[WAF-<PILLAR>] [Brief Finding] — [Risk Level]`
**Body**:
```markdown
## 🏗️ Well-Architected Finding: [Brief Title]
**Pillar**: [Name] | **Risk Level**: [High/Medium/Low] | **Effort**: [Low/Medium/High]
### 📋 Description
[Clear explanation of the finding and why it matters]
### 🔧 Remediation
**IaC Fix** (preferred):
```hcl
# Terraform example
resource "aws_s3_bucket_server_side_encryption_configuration" "example" {
bucket = aws_s3_bucket.example.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
```
**AWS CLI fallback**:
```bash
aws s3api put-bucket-encryption --bucket <name> \
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}'
```
### 📚 AWS Reference
- [WAF Best Practice Link]
- [AWS Documentation Link]
### ✅ Validation
- [ ] Change implemented in IaC and deployed
- [ ] AWS Config rule passes (if applicable)
- [ ] Security Hub finding resolved (if applicable)
**Well-Architected Question**: [WAF question this maps to]
```
### Step 7: Create EPIC Tracking Issue
Label with "well-architected" and "epic".
**Title**: `[EPIC] AWS Well-Architected Review — X findings across 6 pillars`
**Body**: Executive summary with pillar breakdown table (finding counts by pillar and risk level), Mermaid architecture diagram, prioritized checklist linking all individual issues (High → Medium → Low), and success criteria:
- All High-risk findings resolved
- Medium findings have accepted mitigation plans
- No regression in existing CloudWatch alarms or Config rules
## Error Handling
- **No IaC Files Found**: Limit review to live resource discovery via AWS CLI and note the gap
- **Insufficient AWS Permissions**: List required read-only permissions for the review
- **GitHub Creation Failure**: Output all findings as formatted markdown to console
## Success Criteria
- ✅ All 6 WAF pillars reviewed against IaC and live infrastructure
- ✅ All findings classified by risk level and pillar
- ✅ Actionable remediation steps with IaC examples for each finding
- ✅ GitHub issues created for team tracking
- ✅ Architecture diagram generated for EPIC context
- ✅ AWS documentation references included