mirror of
https://github.com/github/awesome-copilot.git
synced 2026-02-22 03:15:13 +00:00
104 lines
3.1 KiB
Markdown
104 lines
3.1 KiB
Markdown
# Star Schema Design for Power BI
|
|
|
|
## Overview
|
|
|
|
Star schema is the optimal design pattern for Power BI semantic models. It organizes data into:
|
|
- **Dimension tables**: Enable filtering and grouping (the "one" side)
|
|
- **Fact tables**: Enable summarization (the "many" side)
|
|
|
|
## Table Classification
|
|
|
|
### Dimension Tables
|
|
- Contain descriptive attributes for filtering/slicing
|
|
- Have unique key columns (one row per entity)
|
|
- Examples: Customer, Product, Date, Geography, Employee
|
|
- Naming convention: Singular noun (`Customer`, `Product`)
|
|
|
|
### Fact Tables
|
|
- Contain measurable, quantitative data
|
|
- Have foreign keys to dimensions
|
|
- Store data at consistent grain (one row per transaction/event)
|
|
- Examples: Sales, Orders, Inventory, WebVisits
|
|
- Naming convention: Business process noun (`Sales`, `Orders`)
|
|
|
|
## Design Principles
|
|
|
|
### 1. Separate Dimensions from Facts
|
|
```
|
|
BAD: Single denormalized "Sales" table with customer details
|
|
GOOD: "Sales" fact table + "Customer" dimension table
|
|
```
|
|
|
|
### 2. Consistent Grain
|
|
Every row in a fact table represents the same thing:
|
|
- Order line level (most common)
|
|
- Daily aggregation
|
|
- Monthly summary
|
|
|
|
Never mix grains in one table.
|
|
|
|
### 3. Surrogate Keys
|
|
Add surrogate keys when source lacks unique identifiers:
|
|
```m
|
|
// Power Query: Add index column
|
|
= Table.AddIndexColumn(Source, "CustomerKey", 1, 1)
|
|
```
|
|
|
|
### 4. Date Dimension
|
|
Always create a dedicated date table:
|
|
- Mark as date table in Power BI
|
|
- Include fiscal periods if needed
|
|
- Add relative date columns (IsCurrentMonth, IsPreviousYear)
|
|
|
|
```dax
|
|
Date =
|
|
ADDCOLUMNS(
|
|
CALENDAR(DATE(2020,1,1), DATE(2030,12,31)),
|
|
"Year", YEAR([Date]),
|
|
"Month", FORMAT([Date], "MMMM"),
|
|
"MonthNum", MONTH([Date]),
|
|
"Quarter", "Q" & FORMAT([Date], "Q"),
|
|
"WeekDay", FORMAT([Date], "dddd")
|
|
)
|
|
```
|
|
|
|
## Special Dimension Types
|
|
|
|
### Role-Playing Dimensions
|
|
Same dimension used multiple times (e.g., Date for OrderDate, ShipDate):
|
|
- Option 1: Duplicate the table (OrderDate, ShipDate tables)
|
|
- Option 2: Use inactive relationships with USERELATIONSHIP in DAX
|
|
|
|
### Slowly Changing Dimensions (Type 2)
|
|
Track historical changes with version columns:
|
|
- StartDate, EndDate columns
|
|
- IsCurrent flag
|
|
- Requires pre-processing in data warehouse
|
|
|
|
### Junk Dimensions
|
|
Combine low-cardinality flags into one table:
|
|
```
|
|
OrderFlags dimension: IsRush, IsGift, IsOnline
|
|
```
|
|
|
|
### Degenerate Dimensions
|
|
Keep transaction identifiers (OrderNumber, InvoiceID) in fact table.
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
| Anti-Pattern | Problem | Solution |
|
|
|--------------|---------|----------|
|
|
| Wide denormalized tables | Poor performance, hard to maintain | Split into star schema |
|
|
| Snowflake (normalized dims) | Extra joins hurt performance | Flatten dimensions |
|
|
| Many-to-many without bridge | Ambiguous results | Add bridge/junction table |
|
|
| Mixed grain facts | Incorrect aggregations | Separate tables per grain |
|
|
|
|
## Validation Checklist
|
|
|
|
- [ ] Each table is clearly dimension or fact
|
|
- [ ] Fact tables have foreign keys to all related dimensions
|
|
- [ ] Dimensions have unique key columns
|
|
- [ ] Date table exists and is marked
|
|
- [ ] No circular relationship paths
|
|
- [ ] Consistent naming conventions
|