Add Power BI modeling skill

This commit is contained in:
Mohammad
2026-02-03 12:29:33 +03:00
parent e32291ee9f
commit 545af30ade
7 changed files with 1040 additions and 0 deletions

View File

@@ -0,0 +1,195 @@
# DAX Measures and Naming Conventions
## Naming Conventions
### General Rules
- Use human-readable names (spaces allowed)
- Be descriptive: `Total Sales Amount` not `TSA`
- Avoid abbreviations unless universally understood
- Use consistent capitalization (Title Case recommended)
- Avoid special characters except spaces
### Table Naming
| Type | Convention | Example |
|------|------------|---------|
| Dimension | Singular noun | Customer, Product, Date |
| Fact | Business process | Sales, Orders, Inventory |
| Bridge | Combined names | CustomerAccount, ProductCategory |
| Measure Table | Underscore prefix | _Measures, _KPIs |
### Column Naming
| Type | Convention | Example |
|------|------------|---------|
| Keys | Suffix with "Key" or "ID" | CustomerKey, ProductID |
| Dates | Suffix with "Date" | OrderDate, ShipDate |
| Amounts | Descriptive with unit hint | SalesAmount, QuantitySold |
| Flags | Prefix with "Is" or "Has" | IsActive, HasDiscount |
### Measure Naming
| Type | Convention | Example |
|------|------------|---------|
| Aggregations | Verb + Noun | Total Sales, Count of Orders |
| Ratios | X per Y or X Rate | Sales per Customer, Conversion Rate |
| Time Intelligence | Period + Metric | YTD Sales, PY Total Sales |
| Comparisons | Metric + vs + Baseline | Sales vs Budget, Growth vs PY |
## Explicit vs Implicit Measures
### Always Create Explicit Measures For:
1. Key business metrics users will query
2. Complex calculations with filter manipulation
3. Measures used in MDX (Excel PivotTables)
4. Controlled aggregation (prevent sum of averages)
### Implicit Measures (Column Aggregations)
- Acceptable for simple exploration
- Set correct SummarizeBy property:
- Amounts: Sum
- Keys/IDs: None (Do Not Summarize)
- Rates/Prices: None or Average
## Measure Patterns
### Basic Aggregations
```dax
Total Sales = SUM(Sales[SalesAmount])
Order Count = COUNTROWS(Sales)
Average Order Value = DIVIDE([Total Sales], [Order Count])
Distinct Customers = DISTINCTCOUNT(Sales[CustomerKey])
```
### Time Intelligence (Requires Date Table)
```dax
YTD Sales = TOTALYTD([Total Sales], 'Date'[Date])
MTD Sales = TOTALMTD([Total Sales], 'Date'[Date])
PY Sales = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date]))
YoY Growth = DIVIDE([Total Sales] - [PY Sales], [PY Sales])
```
### Percentage Calculations
```dax
Sales % of Total =
DIVIDE(
[Total Sales],
CALCULATE([Total Sales], REMOVEFILTERS(Product))
)
Margin % = DIVIDE([Gross Profit], [Total Sales])
```
### Running Totals
```dax
Running Total =
CALCULATE(
[Total Sales],
FILTER(
ALL('Date'),
'Date'[Date] <= MAX('Date'[Date])
)
)
```
## Column References
### Best Practice: Always Qualify Column Names
```dax
// GOOD - Fully qualified
Sales Amount = SUM(Sales[SalesAmount])
// BAD - Unqualified (can cause ambiguity)
Sales Amount = SUM([SalesAmount])
```
### Measure References: Never Qualify
```dax
// GOOD - Unqualified measure
YTD Sales = TOTALYTD([Total Sales], 'Date'[Date])
// BAD - Qualified measure (breaks if home table changes)
YTD Sales = TOTALYTD(Sales[Total Sales], 'Date'[Date])
```
## Documentation
### Measure Descriptions
Always add descriptions explaining:
- What the measure calculates
- Business context/usage
- Any important assumptions
```
measure_operations(
operation: "Update",
definitions: [{
name: "Total Sales",
tableName: "Sales",
description: "Sum of all completed sales transactions. Excludes returns and cancelled orders."
}]
)
```
### Format Strings
| Data Type | Format String | Example Output |
|-----------|---------------|----------------|
| Currency | $#,##0.00 | $1,234.56 |
| Percentage | 0.0% | 12.3% |
| Whole Number | #,##0 | 1,234 |
| Decimal | #,##0.00 | 1,234.56 |
## Display Folders
Organize measures into logical groups:
```
measure_operations(
operation: "Update",
definitions: [{
name: "YTD Sales",
tableName: "_Measures",
displayFolder: "Time Intelligence\\Year"
}]
)
```
Common folder structure:
```
_Measures
├── Sales
│ ├── Total Sales
│ └── Average Sale
├── Time Intelligence
│ ├── Year
│ │ ├── YTD Sales
│ │ └── PY Sales
│ └── Month
│ └── MTD Sales
└── Ratios
├── Margin %
└── Conversion Rate
```
## Variables for Performance
Use variables to:
- Avoid recalculating the same expression
- Improve readability
- Enable debugging
```dax
Gross Margin % =
VAR TotalSales = [Total Sales]
VAR TotalCost = [Total Cost]
VAR GrossProfit = TotalSales - TotalCost
RETURN
DIVIDE(GrossProfit, TotalSales)
```
## Validation Checklist
- [ ] All key business metrics have explicit measures
- [ ] Measures have clear, descriptive names
- [ ] Measures have descriptions
- [ ] Appropriate format strings applied
- [ ] Display folders organize related measures
- [ ] Column references are fully qualified
- [ ] Measure references are not qualified
- [ ] Variables used for complex calculations

View File

@@ -0,0 +1,215 @@
# Performance Optimization for Power BI Models
## Data Reduction Techniques
### 1. Remove Unnecessary Columns
- Only import columns needed for reporting
- Remove audit columns (CreatedBy, ModifiedDate) unless required
- Remove duplicate/redundant columns
```
column_operations(operation: "List", filter: { tableNames: ["Sales"] })
// Review and remove unneeded columns
```
### 2. Remove Unnecessary Rows
- Filter historical data to relevant period
- Exclude cancelled/void transactions if not needed
- Apply filters in Power Query (not in DAX)
### 3. Reduce Cardinality
High cardinality (many unique values) impacts:
- Model size
- Refresh time
- Query performance
**Solutions:**
| Column Type | Reduction Technique |
|-------------|---------------------|
| DateTime | Split into Date and Time columns |
| Decimal precision | Round to needed precision |
| Text with patterns | Extract common prefix/suffix |
| High-precision IDs | Use surrogate integer keys |
### 4. Optimize Data Types
| From | To | Benefit |
|------|-----|---------|
| DateTime | Date (if time not needed) | 8 bytes to 4 bytes |
| Decimal | Fixed Decimal | Better compression |
| Text with numbers | Whole Number | Much better compression |
| Long text | Shorter text | Reduces storage |
### 5. Group and Summarize
Pre-aggregate data when detail not needed:
- Daily instead of transactional
- Monthly instead of daily
- Consider aggregation tables for DirectQuery
## Column Optimization
### Prefer Power Query Columns Over Calculated Columns
| Approach | When to Use |
|----------|-------------|
| Power Query (M) | Can be computed at source, static values |
| Calculated Column (DAX) | Needs model relationships, dynamic logic |
Power Query columns:
- Load faster
- Compress better
- Use less memory
### Avoid Calculated Columns on Relationship Keys
DAX calculated columns in relationships:
- Cannot use indexes
- Generate complex SQL for DirectQuery
- Hurt performance significantly
**Use COMBINEVALUES for multi-column relationships:**
```dax
// If you must use calculated column for composite key
CompositeKey = COMBINEVALUES(",", [Country], [City])
```
### Set Appropriate Summarization
Prevent accidental aggregation of non-additive columns:
```
column_operations(
operation: "Update",
definitions: [{
tableName: "Product",
name: "UnitPrice",
summarizeBy: "None"
}]
)
```
## Relationship Optimization
### 1. Minimize Bidirectional Relationships
Each bidirectional relationship:
- Increases query complexity
- Can create ambiguous paths
- Reduces performance
### 2. Avoid Many-to-Many When Possible
Many-to-many relationships:
- Generate more complex queries
- Require more memory
- Can produce unexpected results
### 3. Reduce Relationship Cardinality
Keep relationship columns low cardinality:
- Use integer keys over text
- Consider higher-grain relationships
## DAX Optimization
### 1. Use Variables
```dax
// GOOD - Calculate once, use twice
Sales Growth =
VAR CurrentSales = [Total Sales]
VAR PriorSales = [PY Sales]
RETURN DIVIDE(CurrentSales - PriorSales, PriorSales)
// BAD - Recalculates [Total Sales] and [PY Sales]
Sales Growth =
DIVIDE([Total Sales] - [PY Sales], [PY Sales])
```
### 2. Avoid FILTER with Entire Tables
```dax
// BAD - Iterates entire table
Sales High Value =
CALCULATE([Total Sales], FILTER(Sales, Sales[Amount] > 1000))
// GOOD - Uses column reference
Sales High Value =
CALCULATE([Total Sales], Sales[Amount] > 1000)
```
### 3. Use KEEPFILTERS Appropriately
```dax
// Respects existing filters
Sales with Filter =
CALCULATE([Total Sales], KEEPFILTERS(Product[Category] = "Bikes"))
```
### 4. Prefer DIVIDE Over Division Operator
```dax
// GOOD - Handles divide by zero
Margin % = DIVIDE([Profit], [Sales])
// BAD - Errors on zero
Margin % = [Profit] / [Sales]
```
## DirectQuery Optimization
### 1. Minimize Columns and Tables
DirectQuery models:
- Query source for every visual
- Performance depends on source
- Minimize data retrieved
### 2. Avoid Complex Power Query Transformations
- Transforms become subqueries
- Native queries are faster
- Materialize at source when possible
### 3. Keep Measures Simple Initially
Complex DAX generates complex SQL:
- Start with basic aggregations
- Add complexity gradually
- Monitor query performance
### 4. Disable Auto Date/Time
For DirectQuery models, disable auto date/time:
- Creates hidden calculated tables
- Increases model complexity
- Use explicit date table instead
## Aggregations
### User-Defined Aggregations
Pre-aggregate fact tables for:
- Very large models (billions of rows)
- Hybrid DirectQuery/Import
- Common query patterns
```
table_operations(
operation: "Create",
definitions: [{
name: "SalesAgg",
mode: "Import",
mExpression: "..."
}]
)
```
## Performance Testing
### Use Performance Analyzer
1. Enable in Power BI Desktop
2. Start recording
3. Interact with visuals
4. Review DAX query times
### Monitor with DAX Studio
External tool for:
- Query timing
- Server timings
- Query plans
## Validation Checklist
- [ ] Unnecessary columns removed
- [ ] Appropriate data types used
- [ ] High-cardinality columns addressed
- [ ] Bidirectional relationships minimized
- [ ] DAX uses variables for repeated expressions
- [ ] No FILTER on entire tables
- [ ] DIVIDE used instead of division operator
- [ ] Auto date/time disabled for DirectQuery
- [ ] Performance tested with representative data

View File

@@ -0,0 +1,147 @@
# Relationships in Power BI
## Relationship Properties
### Cardinality
| Type | Use Case | Notes |
|------|----------|-------|
| One-to-Many (*:1) | Dimension to Fact | Most common, preferred |
| Many-to-One (1:*) | Fact to Dimension | Same as above, direction reversed |
| One-to-One (1:1) | Dimension extensions | Use sparingly |
| Many-to-Many (*:*) | Bridge tables, complex scenarios | Requires careful design |
### Cross-Filter Direction
| Setting | Behavior | When to Use |
|---------|----------|-------------|
| Single | Filters flow from "one" to "many" | Default, best performance |
| Both | Filters flow in both directions | Only when necessary |
## Best Practices
### 1. Prefer One-to-Many Relationships
```
Customer (1) --> (*) Sales
Product (1) --> (*) Sales
Date (1) --> (*) Sales
```
### 2. Use Single-Direction Cross-Filtering
Bidirectional filtering:
- Impacts performance negatively
- Can create ambiguous filter paths
- May produce unexpected results
**Only use bidirectional when:**
- Dimension-to-dimension analysis through fact table
- Specific RLS requirements
**Better alternative:** Use CROSSFILTER in DAX measures:
```dax
Countries Sold =
CALCULATE(
DISTINCTCOUNT(Customer[Country]),
CROSSFILTER(Customer[CustomerKey], Sales[CustomerKey], BOTH)
)
```
### 3. One Active Path Between Tables
- Only one active relationship between any two tables
- Use USERELATIONSHIP for role-playing dimensions:
```dax
Sales by Ship Date =
CALCULATE(
[Total Sales],
USERELATIONSHIP(Sales[ShipDate], Date[Date])
)
```
### 4. Avoid Ambiguous Paths
Circular references cause errors. Solutions:
- Deactivate one relationship
- Restructure model
- Use USERELATIONSHIP in measures
## Relationship Patterns
### Standard Star Schema
```
[Date]
|
[Product]--[Sales]--[Customer]
|
[Store]
```
### Role-Playing Dimension
```
[Date] --(active)-- [Sales.OrderDate]
|
+--(inactive)-- [Sales.ShipDate]
```
### Bridge Table (Many-to-Many)
```
[Customer]--(*)--[CustomerAccount]--(*)--[Account]
```
### Factless Fact Table
```
[Product]--[ProductPromotion]--[Promotion]
```
Used to capture relationships without measures.
## Creating Relationships via MCP
### List Current Relationships
```
relationship_operations(operation: "List")
```
### Create New Relationship
```
relationship_operations(
operation: "Create",
definitions: [{
fromTable: "Sales",
fromColumn: "ProductKey",
toTable: "Product",
toColumn: "ProductKey",
crossFilteringBehavior: "OneDirection",
isActive: true
}]
)
```
### Deactivate Relationship
```
relationship_operations(
operation: "Deactivate",
references: [{ name: "relationship-guid-here" }]
)
```
## Troubleshooting
### "Ambiguous Path" Error
Multiple active paths exist between tables.
- Check for: Multiple fact tables sharing dimensions
- Solution: Deactivate redundant relationships
### Bidirectional Not Allowed
Circular reference would be created.
- Solution: Restructure or use DAX CROSSFILTER
### Relationship Not Detected
Columns may have different data types.
- Ensure both columns have identical types
- Check for trailing spaces in text keys
## Validation Checklist
- [ ] All relationships are one-to-many where possible
- [ ] Cross-filter is single direction by default
- [ ] Only one active path between any two tables
- [ ] Role-playing dimensions use inactive relationships
- [ ] No circular reference paths
- [ ] Key columns have matching data types

View File

@@ -0,0 +1,226 @@
# Row-Level Security (RLS) in Power BI
## Overview
Row-Level Security restricts data access at the row level based on user identity. Users see only the data they're authorized to view.
## Design Principles
### 1. Filter on Dimension Tables
Apply RLS to dimensions, not fact tables:
- More efficient (smaller tables)
- Filters propagate through relationships
- Easier to maintain
```dax
// On Customer dimension - filters propagate to Sales
[Region] = "West"
```
### 2. Create Minimal Roles
Avoid many role combinations:
- Each role = separate cache
- Roles are additive (union, not intersection)
- Consolidate where possible
### 3. Use Dynamic RLS When Possible
Data-driven rules scale better:
- User mapping in a table
- USERPRINCIPALNAME() for identity
- No role changes when users change
## Static vs Dynamic RLS
### Static RLS
Fixed rules per role:
```dax
// Role: West Region
[Region] = "West"
// Role: East Region
[Region] = "East"
```
**Pros:** Simple, clear
**Cons:** Doesn't scale, requires role per group
### Dynamic RLS
User identity drives filtering:
```dax
// Single role filters based on logged-in user
[ManagerEmail] = USERPRINCIPALNAME()
```
**Pros:** Scales, self-maintaining
**Cons:** Requires user mapping data
## Implementation Patterns
### Pattern 1: Direct User Mapping
User email in dimension table:
```dax
// On Customer table
[CustomerEmail] = USERPRINCIPALNAME()
```
### Pattern 2: Security Table
Separate table mapping users to data:
```
SecurityMapping table:
| UserEmail | Region |
|-----------|--------|
| joe@co.com | West |
| sue@co.com | East |
```
```dax
// On Region dimension
[Region] IN
SELECTCOLUMNS(
FILTER(SecurityMapping, [UserEmail] = USERPRINCIPALNAME()),
"Region", [Region]
)
```
### Pattern 3: Manager Hierarchy
Users see their data plus subordinates:
```dax
// Using PATH functions for hierarchy
PATHCONTAINS(Employee[ManagerPath],
LOOKUPVALUE(Employee[EmployeeID], Employee[Email], USERPRINCIPALNAME()))
```
### Pattern 4: Multiple Rules
Combine conditions:
```dax
// Users see their region OR if they're a global viewer
[Region] = LOOKUPVALUE(Users[Region], Users[Email], USERPRINCIPALNAME())
|| LOOKUPVALUE(Users[IsGlobal], Users[Email], USERPRINCIPALNAME()) = TRUE()
```
## Creating Roles via MCP
### List Existing Roles
```
security_role_operations(operation: "List")
```
### Create Role with Permission
```
security_role_operations(
operation: "Create",
definitions: [{
name: "Regional Sales",
modelPermission: "Read",
description: "Restricts sales data by region"
}]
)
```
### Add Table Permission (Filter)
```
security_role_operations(
operation: "CreatePermissions",
permissionDefinitions: [{
roleName: "Regional Sales",
tableName: "Customer",
filterExpression: "[Region] = USERPRINCIPALNAME()"
}]
)
```
### Get Effective Permissions
```
security_role_operations(
operation: "GetEffectivePermissions",
references: [{ name: "Regional Sales" }]
)
```
## Testing RLS
### In Power BI Desktop
1. Modeling tab > View As
2. Select role(s) to test
3. Optionally specify user identity
4. Verify data filtering
### Test Unexpected Values
For dynamic RLS, test:
- Valid users
- Unknown users (should see nothing or error gracefully)
- NULL/blank values
```dax
// Defensive pattern - returns no data for unknown users
IF(
USERPRINCIPALNAME() IN VALUES(SecurityMapping[UserEmail]),
[Region] IN SELECTCOLUMNS(...),
FALSE()
)
```
## Common Mistakes
### 1. RLS on Fact Tables Only
**Problem:** Large table scans, poor performance
**Solution:** Apply to dimension tables, let relationships propagate
### 2. Using LOOKUPVALUE Instead of Relationships
**Problem:** Expensive, doesn't scale
**Solution:** Create proper relationships, let filters flow
### 3. Expecting Intersection Behavior
**Problem:** Multiple roles = UNION (additive), not intersection
**Solution:** Design roles with union behavior in mind
### 4. Forgetting About DirectQuery
**Problem:** RLS filters become WHERE clauses
**Solution:** Ensure source database can handle the query patterns
### 5. Not Testing Edge Cases
**Problem:** Users see unexpected data
**Solution:** Test with: valid users, invalid users, multiple roles
## Bidirectional RLS
For bidirectional relationships with RLS:
```
Enable "Apply security filter in both directions"
```
Only use when:
- RLS requires filtering through many-to-many
- Dimension-to-dimension security needed
**Caution:** Only one bidirectional relationship per path allowed.
## Performance Considerations
- RLS adds WHERE clauses to every query
- Complex DAX in filters hurts performance
- Test with realistic user counts
- Consider aggregations for large models
## Object-Level Security (OLS)
Restrict access to entire tables or columns:
```
// Via XMLA/TMSL - not available in Desktop UI
```
Use for:
- Hiding sensitive columns (salary, SSN)
- Restricting entire tables
- Combined with RLS for comprehensive security
## Validation Checklist
- [ ] RLS applied to dimension tables (not fact tables)
- [ ] Filters propagate correctly through relationships
- [ ] Dynamic RLS uses USERPRINCIPALNAME()
- [ ] Tested with valid and invalid users
- [ ] Edge cases handled (NULL, unknown users)
- [ ] Performance tested under load
- [ ] Role mappings documented
- [ ] Workspace roles understood (Admins bypass RLS)

View File

@@ -0,0 +1,103 @@
# Star Schema Design for Power BI
## Overview
Star schema is the optimal design pattern for Power BI semantic models. It organizes data into:
- **Dimension tables**: Enable filtering and grouping (the "one" side)
- **Fact tables**: Enable summarization (the "many" side)
## Table Classification
### Dimension Tables
- Contain descriptive attributes for filtering/slicing
- Have unique key columns (one row per entity)
- Examples: Customer, Product, Date, Geography, Employee
- Naming convention: Singular noun (`Customer`, `Product`)
### Fact Tables
- Contain measurable, quantitative data
- Have foreign keys to dimensions
- Store data at consistent grain (one row per transaction/event)
- Examples: Sales, Orders, Inventory, WebVisits
- Naming convention: Business process noun (`Sales`, `Orders`)
## Design Principles
### 1. Separate Dimensions from Facts
```
BAD: Single denormalized "Sales" table with customer details
GOOD: "Sales" fact table + "Customer" dimension table
```
### 2. Consistent Grain
Every row in a fact table represents the same thing:
- Order line level (most common)
- Daily aggregation
- Monthly summary
Never mix grains in one table.
### 3. Surrogate Keys
Add surrogate keys when source lacks unique identifiers:
```m
// Power Query: Add index column
= Table.AddIndexColumn(Source, "CustomerKey", 1, 1)
```
### 4. Date Dimension
Always create a dedicated date table:
- Mark as date table in Power BI
- Include fiscal periods if needed
- Add relative date columns (IsCurrentMonth, IsPreviousYear)
```dax
Date =
ADDCOLUMNS(
CALENDAR(DATE(2020,1,1), DATE(2030,12,31)),
"Year", YEAR([Date]),
"Month", FORMAT([Date], "MMMM"),
"MonthNum", MONTH([Date]),
"Quarter", "Q" & FORMAT([Date], "Q"),
"WeekDay", FORMAT([Date], "dddd")
)
```
## Special Dimension Types
### Role-Playing Dimensions
Same dimension used multiple times (e.g., Date for OrderDate, ShipDate):
- Option 1: Duplicate the table (OrderDate, ShipDate tables)
- Option 2: Use inactive relationships with USERELATIONSHIP in DAX
### Slowly Changing Dimensions (Type 2)
Track historical changes with version columns:
- StartDate, EndDate columns
- IsCurrent flag
- Requires pre-processing in data warehouse
### Junk Dimensions
Combine low-cardinality flags into one table:
```
OrderFlags dimension: IsRush, IsGift, IsOnline
```
### Degenerate Dimensions
Keep transaction identifiers (OrderNumber, InvoiceID) in fact table.
## Anti-Patterns to Avoid
| Anti-Pattern | Problem | Solution |
|--------------|---------|----------|
| Wide denormalized tables | Poor performance, hard to maintain | Split into star schema |
| Snowflake (normalized dims) | Extra joins hurt performance | Flatten dimensions |
| Many-to-many without bridge | Ambiguous results | Add bridge/junction table |
| Mixed grain facts | Incorrect aggregations | Separate tables per grain |
## Validation Checklist
- [ ] Each table is clearly dimension or fact
- [ ] Fact tables have foreign keys to all related dimensions
- [ ] Dimensions have unique key columns
- [ ] Date table exists and is marked
- [ ] No circular relationship paths
- [ ] Consistent naming conventions