mirror of https://github.com/github/awesome-copilot.git synced 2026-03-12 04:05:12 +00:00

Files

Fatih f8c2b32140 Add Cloud Design Patterns skill for distributed systems architecture (#942 )

* Fatih: Add Cloud Design Patterns instructions for distributed systems architecture

* Convert Cloud Design Patterns from instruction to skill

* Update skills/cloud-design-patterns/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update skills/cloud-design-patterns/references/reliability-resilience.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

2026-03-12 11:53:00 +11:00

6.2 KiB

Raw Blame History

Reliability & Resilience Patterns

Ambassador Pattern

Problem: Services need proxy functionality for network requests (logging, monitoring, routing, security).

Solution: Create helper services that send network requests on behalf of a consumer service or application.

When to Use:

Offloading common client connectivity tasks (monitoring, logging, routing)
Supporting legacy applications that can't be easily modified
Implementing retry logic, circuit breakers, or timeout handling for remote services

Implementation Considerations:

Deploy ambassador as a sidecar process or container with the application
Consider network latency introduced by the proxy layer
Ensure ambassador doesn't become a single point of failure

Bulkhead Pattern

Problem: A failure in one component can cascade and affect the entire system.

Solution: Isolate elements of an application into pools so that if one fails, the others continue to function.

When to Use:

Isolating critical resources from less critical ones
Preventing resource exhaustion in one area from affecting others
Partitioning consumers and resources to improve availability

Implementation Considerations:

Separate connection pools for different backends
Partition service instances across different groups
Use resource limits (CPU, memory, threads) per partition
Monitor bulkhead health and capacity

Circuit Breaker Pattern

Problem: Applications can waste resources attempting operations that are likely to fail.

Solution: Prevent an application from repeatedly trying to execute an operation that's likely to fail, allowing it to continue without waiting for the fault to be fixed.

When to Use:

Protecting against cascading failures
Failing fast when a remote service is unavailable
Providing fallback behavior when services are down

Implementation Considerations:

Define threshold for triggering circuit breaker (failures/time window)
Implement three states: Closed, Open, Half-Open
Set appropriate timeout values for operations
Log state transitions and failures for diagnostics
Provide meaningful error messages to clients

Compensating Transaction Pattern

Problem: Distributed transactions are difficult to implement and may not be supported.

Solution: Undo the work performed by a sequence of steps that collectively form an eventually consistent operation.

When to Use:

Implementing eventual consistency in distributed systems
Rolling back multi-step business processes that fail partway through
Handling long-running transactions that can't use 2PC

Implementation Considerations:

Define compensating logic for each step in transaction
Store enough state to undo operations
Handle idempotency for compensation operations
Consider ordering dependencies between compensating actions

Retry Pattern

Problem: Transient failures are common in distributed systems.

Solution: Enable applications to handle anticipated temporary failures by retrying failed operations.

When to Use:

Handling transient faults (network glitches, temporary unavailability)
Operations expected to succeed after a brief delay
Non-idempotent operations with careful consideration

Implementation Considerations:

Implement exponential backoff between retries
Set maximum retry count to avoid infinite loops
Distinguish between transient and permanent failures
Ensure operations are idempotent or track retry attempts
Consider jitter to avoid thundering herd problem

Health Endpoint Monitoring Pattern

Problem: External tools need to verify system health and availability.

Solution: Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals.

When to Use:

Monitoring web applications and back-end services
Implementing readiness and liveness probes
Providing detailed health information to orchestrators

Implementation Considerations:

Expose health endpoints (e.g., /health, /ready, /live)
Check critical dependencies (databases, queues, external services)
Return appropriate HTTP status codes (200, 503)
Implement authentication/authorization for sensitive health data
Provide different levels of detail based on security context

Leader Election Pattern

Problem: Distributed tasks need coordination through a single instance.

Solution: Coordinate actions in a distributed application by electing one instance as the leader that manages collaborating task instances.

When to Use:

Coordinating distributed tasks
Managing shared resources in a cluster
Ensuring single-instance execution of critical tasks

Implementation Considerations:

Use distributed locking mechanisms (Redis, etcd, ZooKeeper)
Handle leader failures with automatic re-election
Implement heartbeats to detect leader health
Ensure followers can become leaders quickly

Saga Pattern

Problem: Maintaining data consistency across microservices without distributed transactions.

Solution: Manage data consistency across microservices in distributed transaction scenarios using a sequence of local transactions.

When to Use:

Long-running business processes spanning multiple services
Distributed transactions without 2PC support
Eventual consistency requirements across microservices

Implementation Considerations:

Choose between orchestration (centralized) or choreography (event-based)
Define compensating transactions for rollback scenarios
Handle partial failures and rollback logic
Implement idempotency for all saga steps
Provide clear audit trails and monitoring

Sequential Convoy Pattern

Problem: Process related messages in order without blocking independent message groups.

Solution: Process a set of related messages in a defined order without blocking other message groups.

When to Use:

Message processing requires strict ordering within groups
Independent message groups can be processed in parallel
Implementing session-based message processing

Implementation Considerations:

Use session IDs or partition keys to group related messages
Process each group sequentially but process groups in parallel
Handle message failures within a session appropriately

6.2 KiB Raw Blame History

Reliability & Resilience Patterns

Ambassador Pattern

Bulkhead Pattern

Circuit Breaker Pattern

Compensating Transaction Pattern

Retry Pattern

Health Endpoint Monitoring Pattern

Leader Election Pattern

Saga Pattern

Sequential Convoy Pattern

6.2 KiB

Raw Blame History