awesome-copilot/skills/cloud-design-patterns/references/performance.md

# Performance Patterns

## Asynchronous Request-Reply Pattern

**Problem**: Client applications expect synchronous responses, but back-end processing is asynchronous.

**Solution**: Decouple back-end processing from a front-end host where back-end processing must be asynchronous, but the front end requires a clear response.

**When to Use**:
- Long-running back-end operations
- Client applications can't wait for synchronous responses
- Offloading compute-intensive operations from web tier

**Implementation Considerations**:
- Return HTTP 202 (Accepted) with location header for status checking
- Implement status endpoint for clients to poll
- Consider webhooks for callback notifications
- Use correlation IDs to track requests
- Implement timeouts for long-running operations

## Cache-Aside Pattern

**Problem**: Applications repeatedly access the same data from a data store.

**Solution**: Load data on demand into a cache from a data store when needed.

**When to Use**:
- Frequently accessed, read-heavy data
- Data that changes infrequently
- Reducing load on primary data store

**Implementation Considerations**:
- Check cache before accessing data store
- Load data into cache on cache miss (lazy loading)
- Set appropriate cache expiration policies
- Implement cache invalidation strategies
- Handle cache failures gracefully (fallback to data store)
- Consider cache coherency in distributed scenarios

## CQRS (Command Query Responsibility Segregation) Pattern

**Problem**: Read and write workloads have different requirements and scaling needs.

**Solution**: Separate operations that read data from those that update data by using distinct interfaces.

**When to Use**:
- Read and write workloads have vastly different performance characteristics
- Different teams work on read and write sides
- Need to prevent merge conflicts in collaborative scenarios
- Complex business logic differs between reads and writes

**Implementation Considerations**:
- Separate read and write models
- Use event sourcing to synchronize models
- Scale read and write sides independently
- Consider eventual consistency implications
- Implement appropriate security for commands vs queries

## Index Table Pattern

**Problem**: Queries frequently reference fields that aren't indexed efficiently.

**Solution**: Create indexes over the fields in data stores that queries frequently reference.

**When to Use**:
- Improving query performance
- Supporting multiple query patterns
- Working with NoSQL databases without native indexing

**Implementation Considerations**:
- Create separate tables/collections optimized for specific queries
- Maintain indexes asynchronously using events or triggers
- Consider storage overhead of duplicate data
- Handle index update failures and inconsistencies

## Materialized View Pattern

**Problem**: Data is poorly formatted for required query operations.

**Solution**: Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for query operations.

**When to Use**:
- Complex queries over normalized data
- Improving read performance for complex joins/aggregations
- Supporting multiple query patterns efficiently

**Implementation Considerations**:
- Refresh views asynchronously using background jobs or triggers
- Consider staleness tolerance for materialized data
- Balance between storage cost and query performance
- Implement incremental refresh where possible

## Priority Queue Pattern

**Problem**: Some requests need faster processing than others.

**Solution**: Prioritize requests sent to services so that requests with a higher priority are processed more quickly.

**When to Use**:
- Providing different service levels to different customers
- Processing critical operations before less important ones
- Managing mixed workloads with varying importance

**Implementation Considerations**:
- Use message priority metadata
- Implement multiple queues for different priority levels
- Prevent starvation of low-priority messages
- Monitor queue depths and processing times per priority

## Queue-Based Load Leveling Pattern

**Problem**: Intermittent heavy loads can overwhelm services.

**Solution**: Use a queue as a buffer between a task and a service to smooth intermittent heavy loads.

**When to Use**:
- Protecting services from traffic spikes
- Decoupling producers and consumers
- Enabling asynchronous processing

**Implementation Considerations**:
- Choose appropriate queue technology (Azure Storage Queue, Service Bus, etc.)
- Monitor queue length to detect saturation
- Implement auto-scaling based on queue depth
- Set appropriate message time-to-live (TTL)
- Handle poison messages with dead-letter queues

## Rate Limiting Pattern

**Problem**: Service consumption must be controlled to prevent resource exhaustion.

**Solution**: Control the consumption of resources by applications, tenants, or services to prevent resource exhaustion and throttling.

**When to Use**:
- Protecting backend services from overload
- Implementing fair usage policies
- Preventing one tenant from monopolizing resources

**Implementation Considerations**:
- Implement token bucket, leaky bucket, or fixed window algorithms
- Return HTTP 429 (Too Many Requests) when limits exceeded
- Provide Retry-After headers to clients
- Consider different limits for different clients/tiers
- Make limits configurable and monitorable

## Sharding Pattern

**Problem**: A single data store may have limitations in storage capacity and performance.

**Solution**: Divide a data store into a set of horizontal partitions or shards.

**When to Use**:
- Scaling beyond single database limits
- Improving query performance by reducing dataset size
- Distributing load across multiple databases

**Implementation Considerations**:
- Choose appropriate shard key (hash, range, or list-based)
- Avoid hot partitions by selecting balanced shard keys
- Handle cross-shard queries carefully
- Plan for shard rebalancing and splitting
- Consider operational complexity of managing multiple shards

## Throttling Pattern

**Problem**: Resource consumption must be limited to prevent system overload.

**Solution**: Control the consumption of resources used by an application, tenant, or service.

**When to Use**:
- Ensuring system operates within defined capacity
- Preventing resource exhaustion during peak load
- Enforcing SLA-based resource allocation

**Implementation Considerations**:
- Implement at API gateway or service level
- Use different strategies: reject requests, queue, or degrade service
- Return appropriate HTTP status codes (429, 503)
- Provide clear feedback to clients about throttling
- Monitor throttling metrics to adjust capacity