* Add threat-model-analyst skill: STRIDE-A threat modeling for repositories Add a comprehensive threat model analysis skill that performs security audits using STRIDE-A (STRIDE + Abuse) threat modeling, Zero Trust principles, and defense-in-depth analysis. Supports two modes: - Single analysis: full STRIDE-A threat model producing architecture overviews, DFD diagrams, prioritized findings, and executive assessments - Incremental analysis: security posture diff between baseline report and current code, producing standalone reports with embedded comparison Includes bundled reference assets: - Orchestrator workflows (full and incremental) - Analysis principles and verification checklists - Output format specifications and skeleton templates - DFD diagram conventions and TMT element taxonomy * Address PR review comments from Copilot reviewer - Fix SKILL.md description: use single-quoted scalar, rename mode (2) to 'Incremental analysis' with accurate description - Replace 'Compare Mode (Deprecated)' sections with 'Comparing Commits or Reports' redirect (no deprecated language for first release) - Fix skeleton-findings.md: move Tier 1 table rows under header, add CONDITIONAL-EMPTY block after END-REPEAT (matching Tier 2/3 structure) - Fix skeleton-threatmodel.md and skeleton-architecture.md: use 4-backtick outer fences to avoid nested fence conflicts with inner mermaid fences - Fix skeleton-incremental-html.md: correct section count from 9 to 8 - Fix output-formats.md: change status 'open' to 'Open' in JSON example, move stride_category warning outside JSON fence as blockquote - Fix incremental-orchestrator.md: replace stale compare-output-formats.md reference with inline color conventions - Regenerate docs/README.skills.md with updated description * Address second round of Copilot review comments - Fix diagram-conventions.md: bidirectional flow notation now uses <--> matching orchestrator.md and DFD templates - Fix tmt-element-taxonomy.md: normalize SE.DF.SSH/LDAP/LDAPS to use SE.DF.TMCore.* prefix consistent with all other data flow IDs - Fix output-formats.md: correct TMT category example from SQLDatabase to SQL matching taxonomy, fix component type from 'datastore' to 'data_store' matching canonical enum, remove DaprSidecar from inbound_from per no-standalone-sidecar rule - Fix 5 skeleton files: clarify VERBATIM instruction to 'copy the template content below (excluding the outer code fence)' to prevent agents from wrapping output in markdown fences - Genericize product-specific names in examples: replace edgerag with myapp, BitNetManager with TaskProcessor, AzureLocalMCP with MyApp.Core, AzureLocalInfra with OnPremInfra, MilvusVectorDB with VectorDB * Address third round of Copilot review comments - Fix diagram-conventions.md: second bidirectional two-arrow pattern in Quick Reference section now uses <--> - Fix incremental-orchestrator.md: renumber HTML sections 5-9 to 4-8 matching skeleton-incremental-html.md 8-section structure - Fix output-formats.md: add incremental-comparison.html to File List as conditional output for incremental mode - Fix skeleton-inventory.md: add tmt_type, sidecars, and boundary_kind fields to match output-formats.md JSON schema example
70 KiB
Verification Checklist — Post-Analysis Quality Gates
This file is the single source of truth for all verification rules that must pass before a threat model report is finalized. It is designed to be handed to a verification sub-agent along with the output folder path.
Authority hierarchy: This file contains CHECKING rules (pass/fail criteria for quality gates). The AUTHORING rules that produce the content being checked are in
orchestrator.md. Some rules appear in both files for visibility — if they ever conflict:orchestrator.mdtakes precedence for authoring decisions (how to write), this file takes precedence for pass/fail criteria (what constitutes a valid output). Do NOT remove rules from either file to "deduplicate" — the overlap is intentional for visibility.
When to use: After ALL output files are written (0.1-architecture.md through 0-assessment.md), run every check in this file. If any check fails, fix the issue before finalizing.
Sub-agent delegation: The orchestrator can delegate this entire file to a verification sub-agent with the prompt:
"Read verification-checklist.md. For each check, inspect the named output file(s) and report PASS/FAIL with evidence. Fix any failures."
Inline Quick-Checks (Run Immediately After Each File Write)
Purpose: These are lightweight self-checks the WRITING agent runs immediately after creating each file — NOT deferred to Step 10. Since the agent just wrote the file, the content is still in active context, making these checks highly effective.
How to use: Before writing each file, read the corresponding skeleton from
skeletons/skeleton-*.md. After eachcreate_filecall, scan the content you just wrote for these patterns. If any check fails, fix the file immediately before proceeding to the next step.Skeleton compliance rule: Every output file MUST follow its skeleton's section order, table column headers, and heading names. Do NOT add sections/tables not in the skeleton. Do NOT rename skeleton headings.
After writing 3-findings.md:
- First finding heading starts with
### FIND-01:(notF01,F-01, orFinding 1) - Every finding has these exact row labels:
SDL Bugbar Severity,Remediation Effort,Mitigation Type,Exploitability Tier,Exploitation Prerequisites,Component - Every CVSS value contains
CVSS:4.0/prefix - Every
Related Threatscell contains](2-stride-analysis.md#(hyperlink, not plain text) - Every finding has
#### Description,#### Evidence,#### Remediation, and#### Verificationsub-headings (notRecommendation, notImpact, notMitigation, not bold**Description:**paragraphs) — exactly 4 sub-headings, no extras - Every
#### Descriptionsection has at least 2 sentences of technical detail (not single-sentence stubs) - Every
#### Evidencesection cites specific file paths, line numbers, or config keys (not generic statements like "found in codebase") - Every finding has ALL 10 mandatory attribute rows:
SDL Bugbar Severity,CVSS 4.0,CWE,OWASP,Exploitation Prerequisites,Exploitability Tier,Remediation Effort,Mitigation Type,Component,Related Threats - Every CWE value is a hyperlink: contains
](https://cwe.mitre.org/(not plain text likeCWE-79) - Every OWASP value uses
:2025suffix (not:2021) - Findings organized by TIER (Tier 1/2/3 headings), NOT by severity (no
## Critical Findings) - Tier-Prerequisite consistency (inline): For each finding, use canonical mapping:
None→T1;Authenticated User/Privileged User/Internal Network/Local Process Access→T2;Host/OS Access/Admin Credentials/Physical Access/{Component} Compromise/combos→T3. ⛔Application AccessandHost Accessare FORBIDDEN. - Count finding headings — they must be sequential: FIND-01, FIND-02, FIND-03...
- No time estimates: search for
~,Sprint,Phase,hour,day,week— must not appear - Threat Coverage Verification table present at end of file with columns
Threat ID | Finding ID | Status - Coverage table status values use emoji prefixes:
✅ Covered (FIND-XX),✅ Mitigated (FIND-XX),🔄 Mitigated by Platform— NOT plain text like "Finding", "Mitigated", "Covered" - Coverage table column names are exactly
Threat ID | Finding ID | Status— NOTThreat | Finding | Status
After writing 0-assessment.md:
- First
##heading is## Report Files - Count
##headings — exactly 7 with these exact names: Report Files, Executive Summary, Action Summary, Analysis Context & Assumptions, References Consulted, Report Metadata, Classification Reference - Heading contains
¬and: search forAnalysis Context & Assumptions - Count
---separator lines — at least 5 ### Quick Winsheading exists### Priority by Tier and CVSS Scoreheading exists under Action Summary, BEFORE Quick Wins- Priority table has max 10 rows: Count data rows in Priority by Tier and CVSS Score table — must be ≤ 10
- Priority table sort order: All Tier 1 findings come first, then Tier 2, then Tier 3. Within each tier, higher CVSS scores come first. ❌ T2 finding appearing before a T1 finding → FAIL
- Priority table Finding hyperlinks: Every Finding cell is a hyperlink
[FIND-XX](3-findings.md#find-xx-title-slug). Search for](3-findings.md#in every row — must be present. ❌ Plain textFIND-XXwithout link → FAIL - Priority table anchor resolution: For each hyperlink, verify the anchor slug matches the actual
### FIND-XX:heading in 3-findings.md AS WRITTEN. Compute the anchor from the heading text (lowercase, spaces to hyphens, strip special chars). ❌ If any heading contains status tags like[STILL PRESENT]or[NEW], that is a FAIL — status tags must NOT appear in headings (see Phase 2 check). Anchors should be computed from clean, tag-free heading text. - Action Summary tier hyperlinks: Tier 1, Tier 2, Tier 3 cells in the Action Summary table are hyperlinks to
3-findings.md#tier-Nanchors ### Needs Verificationheading exists### Finding Overridesheading exists- Action Summary has exactly 4 data rows: Tier 1, Tier 2, Tier 3, Total. Search for
| Mitigated |or| Platform |or| Fixed |in the Action Summary table — FAIL if found. These are NOT separate tiers. - Git Commit includes date: The
| Git Commit |row must contain both the SHA and the commit date (e.g.,f49298ff(2026-03-04)). If only the hash is shown without date → FAIL. - Baseline/Target Commits include dates (incremental mode):
| Baseline Commit |and| Target Commit |rows must each include a date alongside the SHA. ### Security Standardsand### Component Documentationheadings exist (two Reference subsections)| Model |row exists in Report Metadata table| Analysis Started |row exists in Report Metadata table| Analysis Completed |row exists in Report Metadata table| Duration |row exists in Report Metadata table- Metadata values wrapped in backticks: check for
`in metadata value cells - Report Files table first row:
0-assessment.mdis the FIRST data row (not0.1-architecture.md) - Report Files completeness: Every generated
.mdand.mmdfile in the output folder has a corresponding row in the Report Files table (threat-inventory.jsonis intentionally excluded) - Report Files conditional rows:
1.2-threatmodel-summary.mmdandincremental-comparison.htmlrows present ONLY if those files were actually generated - Note on threat counts blockquote: Executive Summary contains
> **Note on threat counts:**paragraph - Boundary count: Boundary count in Executive Summary matches actual Trust Boundary Table row count in
1-threatmodel.md - Action Summary tier priorities: Tier 1 = 🔴 Critical Risk, Tier 2 = 🟠 Elevated Risk, Tier 3 = 🟡 Moderate Risk. These are FIXED — never modified based on counts.
- Risk Rating heading has NO emojis:
### Risk Rating: Elevatednot### Risk Rating: 🟠 Elevated
After writing 0.1-architecture.md:
- Count
sequenceDiagramoccurrences — at least 3 - First 3 sequence diagrams have
participantlines and->>message arrows (not empty diagram blocks) - Key Components table row count matches Component Diagram node count
- Every Key Components table row uses PascalCase name (not kebab-case
my-componentor snake_casemy_component) - Every Key Components Type cell is one of:
Process,Data Store,External Service,External Interactor— no ad-hoc types likeRole,Function - Technology Stack table has all 5 rows filled: Languages, Frameworks, Data Stores, Infrastructure, Security
## Security Infrastructure Inventorysection exists (not missing)## Repository Structuresection exists (not missing)
After writing 1.1-threatmodel.mmd:
- Line 1 starts with
%%{init: - Contains
classDef process,classDef external,classDef datastore - No Chakra UI colors (
#4299E1,#48BB78,#E53E3E) linkStyle default stroke:#666666,stroke-width:2pxpresent- DFD uses
flowchart LR(NOTflowchart TB) — search forflowchartand verify direction isLR - Incremental DFD styling (incremental mode only): If new components exist, verify
classDef newComponent fill:#d4edda,stroke:#28a745is present AND new component nodes use:::newComponent(NOT:::process). If removed components exist, verifyclassDef removedComponentwith gray dashed styling. ❌newComponent fill:#6baed6(same blue as process) → FAIL (visually invisible).
After writing 2-stride-analysis.md:
## Summaryappears BEFORE any## ComponentNamesection (check line numbers)- Summary table has columns:
| Component | Link | S | T | R | I | D | E | A | Total | T1 | T2 | T3 | Risk |— search for| S | T | R | I | D | E | A |to verify - Summary table S/T/R/I/D/E/A columns contain numeric values (0, 1, 2, 3...), NOT all identical 1s for every component
- Every component has
#### Tier 1,#### Tier 2,#### Tier 3sub-headings - No
&,/,(,),:in##headings - No status tags in headings (ANY file): Search ALL
.mdfiles for^##.+\[Existing\],^##.+\[Fixed\],^##.+\[Partial\],^##.+\[New\],^##.+\[Removed\], and same for###headings. Also check old-style:^##.+\[STILL,^##.+\[NEW,^###.+\[STILL,^###.+\[NEW CODE. ❌ Tags in headings break anchor links and pollute ToC. Status must be on first line of section body as a blockquote (> **[Tag]**), not in the heading. - CRITICAL — A = Abuse, NEVER Authorization: Search for
| Authorization |in the file. If ANY match is a STRIDE category label (not inside a threat description sentence) → FIX IMMEDIATELY by replacing with| Abuse |. The "A" in STRIDE-A stands for "Abuse" (business logic abuse, workflow manipulation, feature misuse). This is the single most common error observed. - N/A entries not counted: If any component has
N/A — {justification}for a STRIDE category, verify that category shows0(not1) in the Summary table - STRIDE Status values: Every threat row's Status column uses exactly one of:
Open,Mitigated,Platform. NoPartial,N/A,Accepted, or ad-hoc values. - Platform ratio: Count threats with
Platformstatus vs total threats. If >20% (standalone) or >35% (K8s operator) → re-examine each Platform entry. - STRIDE column arithmetic: For every Summary table row, verify S+T+R+I+D+E+A = Total AND T1+T2+T3 = Total
- Full category names in threat tables: Category column uses full names (
Spoofing,Tampering,Information Disclosure,Denial of Service,Elevation of Privilege,Abuse) — NOT abbreviations (S,T,DoS,EoP) - N/A table present: Every component section has a
| Category | Justification |table listing STRIDE categories with no threats — NOT prose/bullet-point format - Link column is separate: Summary table 2nd column is
Linkwith[Link](#anchor)values — component names do NOT contain embedded hyperlinks - Exploitability Tiers 4th column: The tier definition table must have 4th column named
Assignment Rule(NOTExample,Description,Criteria)
After writing incremental-comparison.html (incremental mode only):
- HTML contains
Trust BoundariesorBoundariesin the metrics bar — search for the text "Boundaries" - STRIDE heatmap has 13 columns: Component, S, T, R, I, D, E, A, Total, divider, T1, T2, T3 — search for
T1andT2andT3in the HTML - Fixed/New/Previously Unidentified status information appears ONLY in colored status cards, NOT also as small inline badges in the metrics bar
- No
| Authorization |as a STRIDE category label in the heatmap — search for "Authorization" in heatmap rows - HTML counts match markdown counts: The Total threats in the HTML heatmap must equal the Totals row from
2-stride-analysis.md. If they differ, regenerate the HTML heatmap from the STRIDE summary data. T1+T2+T3 totals in HTML must also match. - Comparison cards present: HTML contains
comparison-cardsdiv with 3 cards: baseline (hash + date + rating), target (hash + date + rating), trend (direction + duration) - Commit dates from git log: Baseline and target dates in comparison cards must match actual commit dates (NOT today's date, NOT analysis run date)
- Code Changes box: 5th metrics box shows commit count and PR count (NOT "Time Between")
- No Time Between box: Search for "Time Between" — must NOT appear in metrics bar
- Status cards are concise: Each status card's
card-itemsdiv must contain only a short summary sentence. ❌ Threat IDs (T06.S, T02.E), finding IDs (FIND-14), or component names listed in cards → FAIL. Search forT\d+\.andFIND-\d+insidecard-itemsdivs. Detailed item breakdowns belong in the Threat/Finding Status Breakdown section, not in the summary cards.
After writing any incremental report file (incremental mode — inline check):
- Simplified display tags only: Search ALL
.mdfiles for old-style tags:[STILL PRESENT],[NEW CODE],[NEW IN MODIFIED],[PREVIOUSLY UNIDENTIFIED],[PARTIALLY MITIGATED],[REMOVED WITH COMPONENT],[MODIFIED]. ❌ Any match → FAIL. Replace with simplified tags:[Existing],[Fixed],[Partial],[New],[Removed]. - Valid display tags: Every finding/threat annotation uses exactly one of the 5 simplified tags:
[Existing],[Fixed],[Partial],[New],[Removed]. Tags must appear as blockquote on first line of body:> **[Tag]**. - Component status simplified: Component status column uses only:
Unchanged,Modified,New,Removed. ❌Restructured→ FAIL (useModifiedinstead). - Change Summary tables use simplified tags: Threat Status table has 4 rows (Existing/Fixed/New/Removed). Finding Status table has 5 rows (Existing/Fixed/Partial/New/Removed). ❌ Old-style rows like
Still Present,New (Code),Partially Mitigated→ FAIL.
After writing threat-inventory.json (inline check):
- JSON threat count matches STRIDE file: Count unique threat IDs in
2-stride-analysis.md(grep^\| T\d+\.). This count MUST equalthreatsarray length in the JSON. If STRIDE has MORE threats than JSON → threats were dropped during serialization. Rebuild the JSON. - JSON metrics internally consistent:
metrics.total_threatsmust equalthreatsarray length.metrics.total_findingsmust equalfindingsarray length.
After writing 0-assessment.md (count validation):
- Element count in Executive Summary matches actual Element Table row count (re-read
1-threatmodel.mdif needed) - Finding count matches actual
### FIND-heading count in3-findings.md - Threat count matches Total from summary table in
2-stride-analysis.md
Phase 0 — Common Deviation Scan
These are the most frequently observed deviations across all previous runs. After output is generated, scan every output file for these specific patterns. Each check has a WRONG pattern to search for and a CORRECT expected pattern.
How to use: For each check, grep/scan the output files for the WRONG pattern. If found → FAIL. Then verify the CORRECT pattern is present. This phase catches recurring mistakes that the generating model tends to make despite instructions.
0.1 Structural Deviations
- Findings organized by severity instead of tier — Search for
## Critical Findings,## Important Findings,## High Findings. These must NOT exist. ❌## Critical Findings→ ✅## Tier 1 — Direct Exposure (No Prerequisites) - Flat STRIDE tables (no tier sub-sections) — Each component in
2-stride-analysis.mdmust have#### Tier 1,#### Tier 2,#### Tier 3sub-headings. ❌ Single flat table per component → ✅ Three separate tier sub-sections - Missing Exploitability Tier or Remediation Effort on findings — Every
### FIND-block in3-findings.mdmust contain bothExploitability TierandRemediation Effortrows. ❌ Missing either field → ✅ Both MANDATORY - STRIDE summary missing tier columns — Summary table in
2-stride-analysis.mdmust includeT1,T2,T3columns. ❌ Only S/T/R/I/D/E/A/Total → ✅ Must also have T1/T2/T3/Risk columns - STRIDE Summary at bottom — Search for the line number of
## Summaryvs first## Component. ❌ Summary after components → ✅ Summary BEFORE all component sections, immediately after## Exploitability Tiers - Exploitability Tiers table columns — The tier definition table in
2-stride-analysis.mdmust have exactly these 4 columns:Tier | Label | Prerequisites | Assignment Rule. ❌Example,Description,Criteriaas 4th column → ✅Assignment Ruleonly. The Assignment Rule cells must contain the rigid rule text, NOT deployment-specific examples.
0.2 File Format Deviations
.mdwrapped in code fences — Check if any.mdfile starts with```markdownor````markdown. ❌```markdown\n# Title→ ✅# Titleon line 1.mmdwrapped in code fences — Check if.mmdfile starts with```plaintextor```mermaid. ❌```mermaid\n%%{init:→ ✅%%{init:on line 1- Leaked skill directives in output — Search ALL
.mdfiles for⛔,RIGID TIER,Do NOT use subjective,MANDATORY,CRITICAL —,decision procedure. These are internal skill instructions that must NOT appear in report output. ❌ Any match → ✅ Zero matches. Remove any leaked directive lines. - Nested duplicate output folder — Check if the output folder contains a subfolder with the same name (e.g.,
threat-model-20260307-081613/threat-model-20260307-081613/). ❌ Subfolder exists → ✅ Delete the nested duplicate. The output folder should contain only files, no subfolders. - STRIDE-A "Authorization" instead of "Abuse" — Search
2-stride-analysis.mdfor| Authorization |or**Authorization**used as a STRIDE category name. The A in STRIDE-A is ALWAYS "Abuse", never "Authorization". ❌ Any match where Authorization is used as a STRIDE category → ✅ Replace with "Abuse". Note: do NOT replace "Authorization" when it appears inside threat descriptions (e.g., "Authorization header", "lacks authorization checks").
0.3 Assessment Section Deviations
- Wrong section name for Action Summary — Search for
Priority Remediation Roadmap,Top Recommendations,Key Recommendations,Risk Profile. ❌ Any of those names → ✅## Action Summaryonly - Separate recommendations section — Search for
### Key Recommendationsor### Top Recommendationsas standalone sections. ❌ Separate section → ✅ Action Summary IS the recommendations - Missing Quick Wins subsection — Search for
### Quick Winsunder Action Summary. ❌ Missing → ✅ Present (with note if no low-effort T1 findings) - Missing threat count context — Search for
> **Note on threat counts:**blockquote in Executive Summary. ❌ Missing → ✅ Present - Missing Analysis Context & Assumptions — Search for
## Analysis Context & Assumptions. ❌ Missing → ✅ Present with### Needs Verificationand### Finding Overridessub-sections - Missing mandatory assessment sections — Verify ALL 7 exist: Report Files, Executive Summary, Action Summary, Analysis Context & Assumptions, References Consulted, Report Metadata, Classification Reference. ❌ Any missing → ✅ All 7 present
0.4 References & Metadata Deviations
- References Consulted as flat table — Search for
| Reference | Usage |pattern. ❌ Two-column flat table → ✅ Two subsections:### Security Standardswith| Standard | URL | How Used |and### Component Documentationwith| Component | Documentation URL | Relevant Section | - References missing URLs — Every row in References Consulted tables must have a full
https://URL. ❌ Missing URL column or empty URLs → ✅ Full URLs in every row - Report Metadata missing Model — Search for
| **Model** |or| Model |row. ❌ Missing → ✅ Present with actual model name - Report Metadata missing timestamps — Search for
Analysis Started,Analysis Completed,Durationrows. ❌ Any missing → ✅ All three present with computed values
0.5 Finding Quality Deviations
- CVSS score without vector or missing prefix — Grep each finding's CVSS field. The value MUST match pattern:
\d+\.\d+ \(CVSS:4\.0/AV:. Specifically check for theCVSS:4.0/prefix — the most common deviation is outputting the vector without this prefix (bareAV:N/AC:L/...). ❌9.3(score only) → ❌9.3 (AV:N/AC:L/...)(no prefix) → ✅9.3 (CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N) - CWE without hyperlink — Grep for
CWE-\d+without preceding[. ❌CWE-78: OS Command Injection→ ✅[CWE-78](https://cwe.mitre.org/data/definitions/78.html): OS Command Injection - OWASP
:2021suffix — Grep for:2021. ❌A01:2021→ ✅A01:2025 - Related Threats as plain text — Grep
Related Threatsrows for pattern without](. ❌T-02, T-17, T-23→ ✅[T02.S](2-stride-analysis.md#component-name), [T17.I](2-stride-analysis.md#other-component) - Finding IDs out of order — Check that FIND-NN IDs are sequential: FIND-01, FIND-02, FIND-03... ❌
FIND-06appearing beforeFIND-04→ ✅ Sequential numbering top-to-bottom - CVSS AV:L or PR:H with Tier 1 — Grep every Tier 1 finding's CVSS vector for
AV:LorPR:H. ❌ Tier 1 with local-only access → ✅ Downgrade to T2/T3 - Localhost-only or admin-only finding in Tier 1 — Check deployment context: air-gapped, localhost, single-admin services should NOT be Tier 1. ❌ Tier 1 for admin-only → ✅ T2/T3
- Time estimates in output — Grep for
~1 hour,Sprint,Phase 1,(hours),(days),(weeks),Immediate. ❌ Any scheduling language → ✅ OnlyLow/Medium/Higheffort labels - "Accepted Risk" in Coverage table — Grep
3-findings.mdforAccepted Risk. ❌ Any match → FAIL. The tool does NOT have authority to accept risks. EveryOpenthreat MUST have a finding. Replace all⚠️ Accepted Riskwith✅ Coveredand create corresponding findings.
0.6 Diagram Deviations
- Wrong color palette — Grep all
#[0-9a-fA-F]{6}in.mmdfiles and Mermaid blocks. ❌#4299E1,#48BB78,#E53E3E,#2B6CB0,#2D3748,#2F855A,#C53030(Chakra UI) → ✅ Only allowed:#6baed6,#2171b5,#fdae61,#d94701,#74c476,#238b45,#e31a1c,#666666,#ffffff,#000000 - Custom themeVariables colors — Search init blocks for
secondaryColor,tertiaryColor, orprimaryTextColor. ❌"primaryColor": "#2D3748", "secondaryColor": "#4299E1"→ ✅ Only'background': '#ffffff', 'primaryColor': '#ffffff', 'lineColor': '#666666'in themeVariables - Missing summary MMD — Count nodes and subgraphs in
1.1-threatmodel.mmd. If elements > 15 OR subgraphs > 4,1.2-threatmodel-summary.mmdMUST exist. ❌ Threshold met but file missing → ✅ File created with summary diagram - Standalone sidecar nodes (K8s only) — Search diagrams for nodes named
MISE,Dapr,Envoy,Istio,Sidecaras separate entries. ❌MISE(("MISE Sidecar"))→ ✅InferencingFlow(("Inferencing Flow<br/>+ MISE")) - Intra-pod localhost flows (K8s only) — Search for
-->|"localhost"|arrows between co-located containers. ❌ Present → ✅ Absent (implicit) - Missing sequence diagrams — First 3 scenarios in
0.1-architecture.mdmust each have asequenceDiagramblock. ❌ Fewer than 3 → ✅ At least 3 - Technology-specific gaps — For every technology in the repo (Redis, PostgreSQL, Docker, K8s, ML/LLM, NFS, etc.), verify at least one finding or documented mitigation exists. ❌ Technology present but no coverage → ✅ Each technology addressed
0.7 Canonical Pattern Checks
- Finding heading pattern — All finding headings match
^### FIND-\d{2}:(neverF01,F-01,Finding 1) - CVSS prefix pattern — All CVSS fields match
\d+\.\d+ \(CVSS:4\.0/AV:(never bareAV:N/AC:L/...) - Related Threats link pattern — Every Related Threat token matches
\[T\d{2}\.[STRIDEA]\]\(2-stride-analysis\.md#[a-z0-9-]+\) - Assessment section headings exact set — Exactly these
##headings in0-assessment.md: Report Files, Executive Summary, Action Summary, Analysis Context & Assumptions, References Consulted, Report Metadata, Classification Reference - Forbidden headings absent — No
##or###headings containing: Severity Distribution, Architecture Risk Areas, Methodology Notes, Deliverables, Priority Remediation Roadmap, Key Recommendations, Top Recommendations
Phase 1 — Per-File Structural Checks
These checks validate each file independently. They can run in parallel.
1.1 All .md Files
- No code-fence wrapping: No
.mdfile starts with```markdownor````markdown. Every.mdfile must begin with a# Headingas its very first line. If any file is wrapped in fences, strip the first and last lines immediately. - No
.mmdcode-fence wrapping: The.mmdfile must NOT start with```plaintextor```mermaid. It must start with%%{init:as the very first characters. If wrapped, strip the fence lines. - No empty files: Every file has substantive content beyond the heading.
1.2 0.1-architecture.md
- Required sections present: System Purpose, Key Components, Component Diagram, Top Scenarios, Technology Stack, Deployment Model, Repository Structure
- Component Diagram exists as a Mermaid
flowchartinside a```mermaidcode fence - Architecture styles used — NOT DFD circles
(("Name")). Must use["Name"]or(["Name"])withservice/external/datastoreclassDef names - At least 3 scenarios have Mermaid
sequenceDiagramblocks - No separate
.mmdfiles were created for 0.1-architecture.md — all diagrams are inline - Component Diagram elements match Key Components table — every row in the table has a corresponding node in the diagram, and vice versa. Count both and verify counts are equal.
- Top Scenarios reflect actual code paths, not hypothetical use cases
- Deployment Model has network details — must mention at least: port numbers OR bind addresses OR network topology
1.3 1.1-threatmodel.mmd
- File exists with pure Mermaid code (no markdown wrapper, no
```mermaidfence) - Starts with
%%{init:block - Contains
classDef process,classDef external,classDef datastore - Uses DFD shapes: circles
(("Name"))for processes, rectangles["Name"]for externals, cylinders[("Name")]for data stores
1.4 1-threatmodel.md
- Diagram content identical to
1.1-threatmodel.mmd— byte-for-byte comparison of the Mermaid block content (excluding the```mermaidfence wrapper) - Element Table present with columns: Element, Type, TMT Category, Description, Trust Boundary
- Data Flow Table present with columns: ID, Source, Target, Protocol, Description
- Trust Boundary Table present with columns: Boundary, Description, Contains
- TMT Category IDs used — Element Table's TMT Category column uses specific TMT element IDs from
tmt-element-taxonomy.md(e.g.,SE.P.TMCore.WebSvc,SE.EI.TMCore.Browser). NOT generic labels likeProcess,External. - Flow IDs match DF\d{2} pattern — Every flow ID in the Data Flow Table uses
DF01,DF02, etc. format. NOTF1,Flow-1,DataFlow1. - If >15 elements or >4 boundaries:
1.2-threatmodel-summary.mmdMUST exist AND1-threatmodel.mdMUST include a "Summary View" section with the summary diagram AND a "Summary to Detailed Mapping" table. To verify: count nodes (lines matching[A-Z]\d+with shape syntax) and subgraphs in1.1-threatmodel.mmd. If count exceeds thresholds but1.2-threatmodel-summary.mmddoes not exist → FAIL — create the summary diagram before proceeding.
1.5 2-stride-analysis.md
- Exploitability Tiers section present at top with tier definition table
- Summary table appears BEFORE individual component sections (immediately after Exploitability Tiers, NOT at the bottom of the file)
- Summary table includes columns: Component, Link, S, T, R, I, D, E, A, Total, T1, T2, T3, Risk
- Every component has
## Component Nameheading followed by Tier 1, Tier 2, Tier 3 sub-sections (all three present even if empty) - Empty tiers use "No Tier N threats identified for this component."
- Anchor-safe headings: No
##heading in this file contains ANY of these characters:&,/,(,),.,:,',",+,@,!. Replace:&→and,/→-, parentheses → omit,:→ omit. - Pod Co-location line present for K8s components listing co-located sidecars
- STRIDE Status values — Every threat row's Status column uses exactly one of:
Open,Mitigated,Platform. NoPartial,N/A, or other ad-hoc values. - A category labeled Abuse — Search
2-stride-analysis.mdfor| Authorization |as a STRIDE category label. FAIL if found. The "A" in STRIDE-A is always "Abuse" (business logic abuse, workflow manipulation, feature misuse), NEVER "Authorization". Also check N/A entries:Authorization — N/Ais WRONG, must beAbuse — N/A. - STRIDE-Coverage Consistency — For every threat ID, the STRIDE Status and Coverage table Status must agree:
- STRIDE
Open→ Coverage✅ Covered (FIND-XX)(finding documents vulnerability needing remediation) - STRIDE
Mitigated→ Coverage✅ Mitigated (FIND-XX)(finding documents existing control the team built) - STRIDE
Platform→ Coverage🔄 Mitigated by Platform - If STRIDE says
Partialbut Coverage saysMitigated by Platform→ CONFLICT. Fix it. - If STRIDE says
Openbut Coverage says⚠️ Needs Review→ only valid if prerequisites ≠None
- STRIDE
1.6 3-findings.md
- Organized by tier using exactly:
## Tier 1 — Direct Exposure (No Prerequisites),## Tier 2 — Conditional Risk (...),## Tier 3 — Defense-in-Depth (...) - NOT organized by severity — no
## Critical Findingsor## Important Findingsheadings - Every finding has ALL mandatory attributes: SDL Bugbar Severity, CVSS 4.0, CWE, OWASP (with
:2025suffix), Exploitation Prerequisites, Exploitability Tier, Remediation Effort, Mitigation Type, Component, Related Threats - Mitigation Type valid values — Every finding's
Mitigation Typerow is one of exactly:Redesign,Standard Mitigation,Custom Mitigation,Existing Control,Accept Risk,Transfer Risk. ❌ Abbreviated forms (Custom,Accept,Standard) or invented values → FAIL - SDL Severity valid values — Every finding's severity is one of:
Critical,Important,Moderate,Low. ❌High,Medium,Info→ FAIL - Remediation Effort valid values — Every finding's effort is one of:
Low,Medium,High. ❌ Time estimates, sprint labels → FAIL - CVSS 4.0 has full vector: Every finding's CVSS value includes BOTH the numeric score AND the full vector string (e.g.,
9.3 (CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N)). Score-only is NOT acceptable. - CWE format: Every CWE uses
CWE-NNN: Nameformat (not just number) - OWASP format: Every OWASP uses
A0N:2025format (never:2021) - Related Threats use individual links per threat ID:
[T01.S](2-stride-analysis.md#component-name)— no grouped links like[T01.S, T01.T](2-stride-analysis.md) - Exploitation Prerequisites present — Every
### FIND-block has a row| Exploitation Prerequisites | - Component field present — Every
### FIND-block has a row| Component | - No Tier 1 with AV:L or PR:H — For every Tier 1 finding, verify its CVSS vector does NOT contain
AV:LorPR:H. If found → tier must be downgraded to T2/T3. - Tier-Prerequisite Consistency (MANDATORY) — For EVERY finding and EVERY threat row, the tier MUST follow mechanically from the prerequisite using the canonical mapping:
None→ T1 (only valid if component's Reachability = External AND Auth = No)Authenticated User,Privileged User,Internal Network,Local Process Access→ T2Host/OS Access,Admin Credentials,Physical Access,{Component} Compromise, anyA + B→ T3- ⛔ FORBIDDEN values:
Application Access,Host Access→ FAIL. Replace withLocal Process Access(T2) orHost/OS Access(T3). - Deployment context rule (Rule 20): If Deployment Classification is
LOCALHOST_DESKTOPorLOCALHOST_SERVICE,Noneis FORBIDDEN for all components. Fix prerequisite toLocal Process AccessorHost/OS Access, then derive tier. - Exposure table cross-check: For each finding, look up its Component in the Component Exposure Table. The finding's prerequisite MUST be ≥ the component's
Min Prerequisite. The finding's tier MUST be ≥ the component'sDerived Tier. - Mismatch = FAIL. Fix by adjusting prerequisites to match deployment evidence, then derive tier from prerequisite.
- Common violations:
Noneon a localhost-only component;Application Access(ambiguous); T1 withInternal Networkprerequisite; T2 withNoneprerequisite.
- Threat Coverage Verification table present at end of file mapping every threat ID → finding ID with status
- Coverage table valid statuses ONLY — Every row in the Coverage table must use exactly one of these three statuses:
✅ Covered (FIND-XX),✅ Mitigated (FIND-XX), or🔄 Mitigated by Platform. ❌⚠️ Accepted Risk→ FAIL (tool cannot accept risks). ❌⚠️ Needs Review→ FAIL (every threat must be resolved). ❌—without a status → FAIL (unaccounted threat). - Mitigated vs Platform distinction — For every
✅ Mitigated (FIND-XX)entry: verify the finding documents an existing security control the engineering team built (auth middleware, TLS, input validation, file permissions). For every🔄 Mitigated by Platform: verify the mitigation is from a genuinely EXTERNAL system (Azure AD, K8s RBAC, TPM). If "Platform" describes THIS repo's code → reclassify as✅ Mitigatedand create a finding. - Platform Mitigation Ratio Audit (MANDATORY) — Count threats marked
🔄 Mitigated by Platformvs total threats. If Platform > 20% → WARNING: Likely overuse of Platform status. For each Platform-mitigated threat, verify ALL three conditions: (1) mitigation is EXTERNAL to this repo's code, (2) managed by a different team, (3) cannot be disabled by modifying this code. Common violations: "auth middleware" (that's THIS code → should beMitigated), "TLS on localhost" (THIS code → should beMitigated), "file permissions" (THIS code → should beMitigated). - Coverage Feedback Loop Verification — After the Coverage table is written, verify: (1) every threat with STRIDE status
Openhas a corresponding finding in the table. (2) No—dashes without a status. (3) If gaps exist, new findings were created to fill them. The Coverage table is a FEEDBACK LOOP — its purpose is to catch missed findings and force their creation. If gaps remain after the table is written, the loop was not executed. - "Accepted Risk" in Coverage table — Grep
3-findings.mdforAccepted Risk. ❌ Any match → FAIL. The tool does NOT have authority to accept risks. EveryOpenthreat MUST have a finding. EveryMitigatedthreat MUST have a finding documenting the team's control. - "Needs Review" in Coverage table — Grep
3-findings.mdforNeeds Review. ❌ Any match → FAIL. "Needs Review" has been replaced: threats are either Covered (vulnerability), Mitigated (team built a control), or Platform (external system). There is no deferred category.
1.7 0-assessment.md
- Section order: Report Files → Executive Summary → Action Summary → Analysis Context & Assumptions → References Consulted → Report Metadata → Classification Reference (last)
- Report Files section is the very first section after the title
- Risk Rating heading has NO emojis:
### Risk Rating: Elevatednot### Risk Rating: 🟠 Elevated - Threat count context paragraph present as blockquote at end of Executive Summary
- No separate Recommendations section — Action Summary IS the recommendations
- Action Summary table present with Tier, Description, Threats, Findings, Priority columns
- Action Summary is the ONLY name: No sections titled "Priority Remediation Roadmap", "Top Recommendations", "Key Recommendations", or "Risk Profile"
- Quick Wins subsection present (or explicitly omitted if no low-effort T1 findings)
- Needs Verification section present under Analysis Context & Assumptions
- References Consulted has two subsections:
### Security Standardsand### Component Documentation - References Consulted tables use three columns with full URLs:
| Standard | URL | How Used |and| Component | Documentation URL | Relevant Section |— NOT a flat| Reference | Usage |table - Finding Overrides uses table format even when empty (never plain text)
- Report Metadata is the absolute last section before Classification Reference with all required fields
- Metadata timestamps came from actual command execution (not derived from folder names)
- Model field present — value matches the model being used (e.g.,
Claude Opus 4.6,GPT-5.3 Codex,Gemini 3 Pro) - Analysis Started and Analysis Completed fields present with UTC timestamps from
Get-Datecommands - Duration field present — computed from Analysis Started and Analysis Completed timestamps
- Metadata values in backticks — Every value cell in the Report Metadata table must be wrapped in backticks. Spot-check at least 5 rows.
- Horizontal rules between sections — Count lines matching
---in the file. Must be ≥ 6 (one between each pair of the 7##sections). - Classification Reference is last section —
## Classification Referencepresent as the final##heading. Contains a single 2-column table (Classification | Values) with rows for: Exploitability Tiers, STRIDE + Abuse, SDL Severity, Remediation Effort, Mitigation Type, Threat Status, CVSS, CWE, OWASP. ❌ Missing section or wrong format → FAIL. - Classification Reference is static — Values in the table must match the skeleton EXACTLY (copied verbatim). No additional rows, no modified descriptions. Compare against
skeleton-assessment.mdClassification Reference section. - No forbidden section headings — Search for:
Severity Distribution,Architecture Risk Areas,Methodology Notes,Deliverables,Priority Remediation Roadmap,Key Recommendations,Top Recommendations. Must return 0 matches. - Action Summary tier priorities are FIXED — In the Action Summary table of
0-assessment.md, verify the Priority column: Tier 1 =🔴 Critical Risk, Tier 2 =🟠 Elevated Risk, Tier 3 =🟡 Moderate Risk. ❌ Tier 1 with Low/Moderate/Elevated → FAIL. ❌ Tier 2 with Critical/Low → FAIL. These are FIXED labels that never change regardless of threat/finding counts. - Action Summary has all 3 tiers — The Action Summary table MUST have rows for Tier 1, Tier 2, AND Tier 3, even if a tier has 0 threats and 0 findings. Missing tiers → FAIL.
Phase 2 — Diagram Rendering Checks
Run against ALL Mermaid blocks across all files. Can be delegated as a focused sub-task.
2.1 Init Blocks
- Every flowchart has
%%{init}%%block with'background': '#ffffff'as the first line - Every sequence diagram has the full
%%{init}%%theme variables block with'background': '#ffffff' - NO custom color keys in themeVariables — init block must NOT contain
primaryColor(except#ffffff),secondaryColor, ortertiaryColor. All element colors come from classDef only.
2.2 Class Definitions & Color Palette
- Every
classDefincludescolor:#000000(explicit black text) - DFD diagrams use
process/external/datastoreclass names - Architecture diagrams use
service/external/datastoreclass names - EXACT hex codes used — grep all
#[0-9a-fA-F]{6}values in.mmdfiles. The ONLY allowed fill colors are:#6baed6,#fdae61,#74c476,#ffffff,#000000. The ONLY allowed stroke colors are:#2171b5,#d94701,#238b45,#e31a1c,#666666. If ANY other hex color appears (e.g.,#4299E1,#48BB78,#E53E3E,#2B6CB0), the diagram FAILS this check.
2.3 Styling
- Every flowchart has
linkStyle default stroke:#666666,stroke-width:2px - Trust boundary styles use
stroke:#e31a1c,stroke-width:3px(NOT#ff0000orstroke-width:2px) - Architecture layer styles use light fills with matching borders (not red dashed trust boundaries)
2.4 Syntax Validation
- All labels quoted:
["Name"],(("Name")),[("Name")],-->|"Label"|,subgraph ID["Title"] - Subgraph/end pairs matched: Every
subgraphhas a closingend - No stray characters or unclosed quotes in any Mermaid block
2.5 Kubernetes Sidecar Rules
Skip this section if the target system is NOT deployed on Kubernetes.
- Every K8s service node annotated with sidecars:
<br/>+ SidecarNamein the node label - Zero standalone sidecar nodes: Search all diagrams for nodes named
MISE,Dapr,Envoy,Istio,Sidecar— these must NOT exist as separate nodes - Zero intra-pod localhost flows: No arrows between a container and its sidecars (no
-->|"localhost"patterns) - Cross-boundary sidecar flows originate from host container: All arrows to external targets (Azure AD, Redis, etc.) come from the host container node, not from a standalone sidecar node
- Element Table: No separate rows for sidecars — described in host container's description column
Phase 3 — Cross-File Consistency Checks
These checks validate relationships between files. They require reading multiple files together.
3.1 Component Coverage (Architecture → STRIDE → Findings)
- Every component in
0.1-architecture.mdKey Components table has a corresponding## Componentsection in2-stride-analysis.md - Every element in the
1-threatmodel.mdElement Table that is a Process has a corresponding## Componentsection in2-stride-analysis.md - No orphaned components in
2-stride-analysis.mdthat don't appear in the Element Table - Summary table component count matches the number of
## Componentsections in the file - Component count exact match — Count rows in
0.1-architecture.mdKey Components table (excluding header/separator). Count##component sections in2-stride-analysis.md(excluding## Exploitability Tiers,## Summary). These counts MUST be equal.
3.2 Data Flow Coverage (STRIDE ↔ DFD)
- Every Data Flow ID (
DF01,DF02, ...) from the1-threatmodel.mdData Flow Table appears in at least one "Affected Flow" cell in2-stride-analysis.md - No orphaned flow IDs in STRIDE analysis that aren't defined in the Data Flow Table
3.3 Threat-to-Finding Traceability (STRIDE ↔ Findings)
This is the most critical cross-file check. It ensures no identified threat is silently dropped.
- Every threat ID in
2-stride-analysis.md(e.g., T01.S, T01.T1, T02.I) is referenced by at least one finding in3-findings.mdvia its Related Threats field - Collect all threat IDs from all tier tables in
2-stride-analysis.md - Collect all threat IDs referenced in Related Threats fields in
3-findings.md - Coverage gap report: List any threat ID present in STRIDE but missing from findings. If gaps exist → either add a finding or group the threat into an existing related finding
3.4 Finding-to-STRIDE Anchor Integrity (Findings → STRIDE)
- Every Related Threats link in
3-findings.mduses format[ThreatID](2-stride-analysis.md#component-anchor) - Every
#component-anchorresolves to an actual## Headingin2-stride-analysis.md - Anchor construction verified: heading → lowercase → spaces to hyphens → strip non-alphanumeric except hyphens
- Spot-check at least 3 anchors by following the link and confirming the threat ID exists under that heading
3.5 Count Consistency (Assessment ↔ All Files)
- Element count in Executive Summary matches actual Element Table row count in
1-threatmodel.md - Finding count in Executive Summary matches actual finding count in
3-findings.md - Threat count in Executive Summary matches Total from summary table in
2-stride-analysis.md - Tier counts in threat count context paragraph match actual T1/T2/T3 totals from
2-stride-analysis.md - Action Summary tier table counts match actual per-tier counts from
3-findings.md(findings column) and2-stride-analysis.md(threats column)
Verification methods for count checks:
- Element count: count
|rows in Element Table of1-threatmodel.md, subtract 2 (header + separator) - Finding count: count
### FIND-headings in3-findings.md - Threat count: read the Totals row in
2-stride-analysis.mdSummary table, take theTotalcolumn value - Tier counts: from same Totals row, take T1, T2, T3 column values
3.6 STRIDE Summary Table Arithmetic
- Per-row: S + T + R + I + D + E + A = Total for every component
- Per-row: T1 + T2 + T3 = Total for every component
- Totals row: Each column sum across all component rows equals the Totals row value
- Row count cross-check: Number of threat rows in each component's detail tables equals its Total in the summary table
- No artificial all-1s pattern: Check the Summary table for the pattern where every STRIDE column (S,T,R,I,D,E,A) is exactly 1 for every component. If ALL components have exactly 1 threat in every STRIDE category → FAIL (indicates formulaic "minimum 1 per category" inflation rather than genuine analysis). A valid analysis should have varying counts per category reflecting actual attack surface: some categories may be 0 (with N/A justification), others 2-3. Uniform 1s across all components is a strong signal of artificial padding.
- N/A entries excluded from totals: If any component has
N/A — {justification}entries for STRIDE categories, verify those categories show 0 in the Summary table (not 1). N/A entries do NOT count as threats.
3.7 Sort Order (Findings)
- Within each tier section: Findings appear in order Critical → Important → Moderate → Low
- Within each severity band: Higher-CVSS findings appear before lower-CVSS findings
- No misordering: Scan sequentially and confirm no reversal
3.8 Report Files Table (Assessment ↔ Output Folder)
- Every file listed in the Report Files table of
0-assessment.mdexists in the output folder 0.1-architecture.mdis listed in the Report Files table- If
1.2-threatmodel-summary.mmdwas not generated: it is omitted from the Report Files table (not listed with a "N/A" note)
Phase 4 — Evidence Quality Checks
These checks validate the substance of findings, not just structure. Ideally run by a sub-agent with code access.
4.1 Finding Evidence
- Every finding has an Evidence section citing specific files/lines/configs
- Evidence is concrete: Shows actual code or config, not just "absence of config"
- For "missing security" claims: Evidence proves the platform default is insecure (not just that explicit config is absent)
4.2 Verify-Before-Flagging Compliance
- Security infrastructure inventory was performed before STRIDE analysis (check for platform security defaults verification in findings)
- No false positive patterns: No finding claims "missing mTLS" when Dapr Sentry is present, or "missing RBAC" on K8s ≥1.6, etc.
- Finding classification applied: Every documented finding is "Confirmed" (not "Needs Verification" — those belong in
0-assessment.md)
4.3 Needs Verification Placement
- All "Needs Verification" items are in
0-assessment.mdunder Analysis Context & Assumptions — NOT in3-findings.md - No ambiguous findings: Findings in
3-findings.mdhave positive evidence of a vulnerability
Verification Summary Template
After running all checks, produce a summary.
Sub-agent output MUST include:
- Phase name
- Total checks, Passed, Failed
- For each failure: Check ID, file, evidence, exact fix instruction
- Re-run status after fixes
Do not return "looks good" without counts.
## Verification Results
| Phase | Checks | Passed | Failed | Notes |
|-------|--------|--------|--------|-------|
| 0 — Common Deviation Scan | [N] | [N] | [N] | [pattern matches] |
| 1 — Per-File Structural | [N] | [N] | [N] | [files with issues] |
| 2 — Diagram Rendering | [N] | [N] | [N] | [specific failures] |
| 3 — Cross-File Consistency | [N] | [N] | [N] | [gaps found] |
| 4 — Evidence Quality | [N] | [N] | [N] | [false positive risks] |
| 5 — JSON Schema | [N] | [N] | [N] | [schema issues] |
### Failed Checks Detail
<!-- For each failed check, list: check ID, file(s), what's wrong, suggested fix -->
Phase 5 — threat-inventory.json Schema Validation
These checks validate the JSON inventory file generated in Step 8b. This file is critical for comparison mode.
5.1 Schema Fields
schema_versionfield — Present and equals"1.0"(standalone) or"1.1"(incremental). If the report contains"incremental": true, schema_version MUST be"1.1". Otherwise"1.0".commitfield — Present (short SHA or"Unknown")componentsarray — Non-empty, has at least 1 entry- Component IDs — Every component has
id(PascalCase),display,type,boundary - Component field name compliance — Components use
"display"(NOT"display_name"). Grep:"display_name"must return 0 matches. - Threat field name compliance — Threats use
"stride_category"(NOT"category"). Threats have BOTH"title"AND"description"(NOT justdescriptionalone, NOT"name"). Threat→component link is inside"identity_key"."component_id"(NOT a top-level"component_id"on the threat object). Grep: top-level"category":outside identity_key must return 0 matches. Grep: every threat object must contain"title":. boundariesarray — Present (can be empty for flat systems)flowsarray — Present, each flow has canonical ID formatDF_{Source}_to_{Target}threatsarray — Non-emptyfindingsarray — Non-emptymetricsobject — Present withtotal_components,total_threats,total_findings
5.2 Metrics Consistency
metrics.total_components == components.length— Array length matches countmetrics.total_threats == threats.length— Array length matches countmetrics.total_findings == findings.length— Array length matches count- Metrics match markdown reports —
total_threatsequals Total from STRIDE summary table,total_findingsequals### FIND-count in3-findings.md - Truncation recovery gate — If ANY array length mismatch was detected above, verify that the file was regenerated (not patched). Check: file size > 10KB for repos with >40 threats; threats array has entries for EVERY component that appears in
2-stride-analysis.md - Pre-write strategy compliance — If
metrics.total_threats > 50, verify that the JSON was written via sub-agent delegation, Python script, or chunked append — NOT a singlecreate_filecall. Evidence: check log foragentinvocation or_extract.pyscript or multiplereplace_string_in_fileoperations on the JSON file.
5.3 Deterministic Identity Stability (for comparison readiness)
- Components include deterministic identity fields — every component has
aliases(array),boundary_kind, andfingerprint boundary_kindvalid values — every component'sboundary_kindis one of:MachineBoundary,NetworkBoundary,ClusterBoundary,ProcessBoundary,PrivilegeBoundary,SandboxBoundary. ❌ Any other value (e.g.,DataStorage,ApplicationCore,deployment,trust) → FAIL- Boundaries include deterministic identity fields — every boundary has
kind,aliases(array), andcontains_fingerprint - Boundary
kindvalid values — every boundary'skindis one of the same 6 TMT-aligned values asboundary_kind. ❌ Any other value → FAIL - No duplicate canonical component IDs —
components[].idvalues are unique after normalization - Alias mapping is coherent — no alias appears under two unrelated component IDs in the same inventory
- Fingerprint evidence fields are stable-only —
fingerprintuses source files/topology/type/protocols, not freeform prose - Deterministic ordering applied — arrays sorted by canonical key (
components.id,boundaries.id,flows.id,threats.id,findings.id)
5.4 Comparison Drift Guardrails (when validating comparison outputs)
- High-confidence rename candidates are not left as add/remove — component pairs with strong alias/source-file/topology overlap are classified as
renamed/modified - Boundary rename candidates use containment overlap — same
kind+ highcontainsoverlap are classified as boundaryrenamed, notadded+removed - Split/merge boundary transitions recognized — one-to-many and many-to-one containment transitions are mapped to
split/mergedcategories
5.5 Comparison Integrity Checks (when validating comparison outputs)
- Baseline ≠ Current commit —
metadata.json→baseline.commitmust differ fromcurrent.commit. Same-commit comparisons are invalid (zero real code changes to compare). - Files changed > 0 —
metadata.json→git_diff_stats.files_changedmust be > 0. A comparison with 0 files changed has no code delta and is meaningless. - Duration > 0 —
metadata.json→durationmust NOT be"0m 0s"or any value under 2 minutes. A genuine comparison requires reading two inventories, performing multi-signal matching, computing heatmaps, and generating HTML — this takes real time. - No external folder references —
metadata.jsonand all output files must NOT contain references toD:\One\tmor any folder outside the repository being analyzed. Reports should only reference folders within the current repo. - Anti-reuse verification — The comparison output must be freshly generated, not copied from a prior
threat-model-compare-*folder. Verify by checking thatmetadata.jsontimestamps are from the current run. - Methodology drift ratio — If
diff-result.json→metrics.methodology_drift_ratio> 0.50, verify the HTML report contains a methodology drift warning banner. If ratio not computed but >50% of component renames share the same aliases/fingerprints, flag as validation failure.
Phase 6 — Deterministic Identity & Naming Stability
These checks validate that component/boundary/flow naming follows deterministic rules, enabling reproducible outputs across independent runs of the same code.
6.1 Component ID Determinism
- Component IDs derived from code artifacts — Every component ID in
threat-inventory.jsonmust trace to an actual class name, file path, deployment manifestmetadata.name, or config key. No abstract concepts (ConfigurationStore,DataLayer,LocalFileSystem). Grep component IDs against source file names and class names — at least 80% should have a direct match. - Component anchor verification — Every process-type component in
threat-inventory.jsonmust have non-emptyfingerprint.source_filesorfingerprint.source_directories. If both are empty → FAIL (component has no code anchor). - Helm/K8s workload naming — For K8s-deployed components, verify the component ID matches the
metadata.namefrom the Deployment/StatefulSet YAML, not the Helm template filename or directory. Example:DevPortal(from deployment name), NOTtemplates-knowledge-deployment(from file path). - External service anchoring — External services (no source code in repo) must anchor to their integration point: client class name, config key, or SDK dependency. Verify
fingerprint.config_keysorfingerprint.class_namesis populated. - Forbidden naming patterns absent — No component ID is a generic label: grep for
ConfigurationStore,DataLayer,LocalFileSystem,SecurityModule,NetworkLayer,DatabaseAccess. → Must return 0 matches. - Acronym consistency — Well-known acronyms must be ALL-CAPS in PascalCase IDs:
API,NFS,LLM,SQL,DB,AD,UI. Grep forApi(should beAPI),Nfs(should beNFS),Llm(should beLLM). → Must return 0 matches. - Common technology naming exactness — Verify these exact IDs where applicable:
Redis(notRedisCache),Milvus(notMilvusDB),NginxIngress(notIngressNginx),AzureAD(notAzureAd),PostgreSQL(notPostgres).
6.2 Boundary Naming Stability
- Boundary IDs are PascalCase — Every boundary ID in
threat-inventory.jsonuses PascalCase derived from deployment topology (e.g.,K8sCluster,External,Application). NOT code architecture layers (PresentationLayer,BusinessLogic). - No code-layer boundaries for single-process apps — If the system is a single process (one .exe, one container), there should be exactly 1
Applicationboundary — NOT 4+ boundaries for Presentation/Business/Data layers. Count boundaries and verify proportion. - K8s multi-service sub-boundaries — For K8s namespaces with multiple Deployments, verify sub-boundaries exist:
BackendServices,DataStorage,MLModels,Agentic(as applicable).
6.3 Data Flow Completeness
- Bidirectional flows for ingress/reverse proxy — If an ingress component (Nginx, Traefik) routes to backends, verify BOTH directions exist:
DF_Ingress_to_BackendANDDF_Backend_to_Ingress. Count forward flows through ingress and verify matching response flows. - Bidirectional flows for databases — For every
DF_Service_to_Datastoreflow, verify a correspondingDF_Datastore_to_Serviceread flow exists. Datastores: Redis, Milvus, PostgreSQL, MongoDB, etc. - Flow count stability — Count flows in
threat-inventory.json. Two independent runs on same code should produce same count (±3 acceptable). If flow count differs by >5 between old and HEAD analyses for unchanged components, flag as naming drift.
6.4 Count Stability (Cross-Run Determinism)
- Component count within tolerance — If comparing two analyses of the same code, component count must be within ±1. Difference ≥3 = FAIL.
- Boundary count within tolerance — Same code → boundary count within ±1.
- Fingerprint completeness for process components — Every component with
type: "process"must have non-emptyfingerprint.source_directoriesandfingerprint.class_names. Empty arrays for process components → FAIL. - STRIDE category single-letter enforcement — Every
threats[].stride_categoryin JSON is exactly one letter: S, T, R, I, D, E, or A. Grep for full names ("Spoofing","Tampering","Denial of Service") → Must return 0 matches. This prevents heatmap computation errors.
Phase 7 — Evidence-Based Prerequisites & Coverage Completeness
These checks validate that prerequisites, tiers, and coverage follow deterministic evidence-based rules.
7.1 Prerequisite Determination Evidence
- No prerequisite without deployment evidence — For every finding with
Exploitation Prerequisites≠None, verify the prerequisite reflects actual deployment config (Helm values, Dockerfile, service type, ingress rules). If prerequisite saysInternal Networkbut no evidence of network restriction exists → FAIL. - Prerequisite consistency across same code — If two analyses of the same code produce different prerequisites for the same vulnerability, the skill rules are insufficient. Flag for investigation.
7.1b Deployment Classification Gate (MANDATORY)
- Deployment Classification present —
0.1-architecture.mdmust contain a**Deployment Classification:**line with one of:LOCALHOST_DESKTOP,LOCALHOST_SERVICE,AIRGAPPED,K8S_SERVICE,NETWORK_SERVICE. ❌ Missing → FAIL. - Component Exposure Table present —
0.1-architecture.mdmust contain a### Component Exposure Tablewith columns: Component, Listens On, Auth Required, Reachability, Min Prerequisite, Derived Tier. ❌ Missing → FAIL. - Exposure table completeness — Every component in Key Components table has a corresponding row in the Component Exposure Table. ❌ Missing rows → FAIL.
- Deployment classification enforced on T1 — If Deployment Classification is
LOCALHOST_DESKTOPorLOCALHOST_SERVICE:- Count findings with
Exploitation Prerequisites=None. ❌ Count > 0 → FAIL (must beLocal Process AccessorHost/OS Accessminimum). - Count findings in
## Tier 1. ❌ Count > 0 → FAIL (must be T2+ for localhost/desktop apps). - For each finding with
AV:Nin CVSS, check the component'sReachabilitycolumn. ❌AV:NwithReachability ≠ External→ FAIL.
- Count findings with
- Prerequisite floor enforced — For EVERY finding, look up the finding's
Componentin the exposure table. The finding'sExploitation Prerequisitesmust be ≥ theMin Prerequisitein the table. The finding's tier must be ≥ theDerived Tier. ❌ Finding hasNonebut table saysLocal Process Access→ FAIL. - Prerequisite basis in Evidence — Every finding's
#### Evidencesection must contain a**Prerequisite basis:**line citing specific code/config that determines the prerequisite. ❌ Missing or generic ("found in codebase") → FAIL.
7.2 Coverage Completeness
- Technology coverage check — For each major technology in the repo (Redis, PostgreSQL, Docker, K8s, ML/LLM, NFS, etc.), verify at least one finding or documented mitigation addresses it. Scan
0.1-architecture.mdTechnology Stack table → for each technology, grep3-findings.mdfor a matching finding. - Minimum finding threshold — Small repo (<20 files): ≥8 findings; Medium (20-100): ≥12; Large (100+): ≥18. Count
### FIND-headings and verify against repo size. - Platform ratio within context-aware limit — Detect deployment pattern: if go.mod contains
controller-runtime/kubebuilder/operator-sdk→ K8s Operator (limit ≤35%); otherwise → Standalone App (limit ≤20%). Count Platform-status threats / total threats. If exceeds limit → FAIL. Document detected pattern in assessment. - DoS with None prerequisites = Finding — Every DoS threat (
.D) withPrerequisites: Nonemust have a corresponding finding. Grep STRIDE analysis for.Dthreats with None prerequisites and verify each maps to a finding ID in Coverage table.
7.3 Security Infrastructure Awareness
- Security infrastructure inventory mentioned — Verify
0.1-architecture.mdor2-stride-analysis.mdreferences security components (service mesh, cert management, auth middleware) if they exist in the codebase. If Dapr Sentry is deployed, mTLS cannot be flagged as "missing." - Burden of proof for missing-security claims — Every finding that claims "missing X" must prove the platform default is insecure, not just that explicit config is absent. Spot-check the highest-severity "missing" finding.
Phase 8 — Comparison HTML Report Structure (comparison outputs only)
These checks validate the HTML comparison report structure.
8.1 HTML Comparison Report Structure
- Exactly 4
<h2>sections — The HTML must have exactly these<h2>headings in order: "Executive Summary", "Threat Tier Distribution", "STRIDE-A Heatmap (with Delta Indicators)", "Comparison Basis — Component Mapping". ❌ Extra sections like "Overall Risk Shift", "Key Delta Metrics", "Metrics Overview", "Findings Diff" as<h2>→ FAIL (these are either inline elements or removed). ❌ Missing any of the 4 → FAIL. - No Findings Diff section — The HTML must NOT contain a "Findings Diff"
<h2>section or any findings diff subsections (Fixed, Removed, Analysis Gaps, New, Changed, Unchanged). If present → FAIL. - No delta metric cards — The HTML must NOT contain
.risk-deltacards (Findings Fixed, New Findings, Net Change, Removed, Analysis Gaps, Code-Verified). If present → FAIL. - Risk shift and metrics bar as inline elements — Risk shift and metrics bar (Components/Threats/Boundaries/Flows/Time) are inline card elements, NOT
<h2>sections. If they appear as<h2>→ FAIL. - Metrics bar includes trust boundaries — The metrics bar MUST show trust boundary counts (e.g.,
2 → 2). If boundaries are missing from the metrics bar → FAIL. Components, Threats, Trust Boundaries, Findings, and Code Changes are the 5 required metric boxes. - Metrics bar 5th box is Code Changes — The 5th metrics box MUST show commit count and PR count (e.g.,
142 commits, 23 PRs). ❌ "Time Between" → FAIL. The duration/dates are now in the comparison cards (Section 1), not the metrics bar. - Comparison cards structure — Section 1 MUST contain a
comparison-cardsdiv with 3 sub-cards: Baseline (hash, date, rating), Target (hash, date, rating), Trend (direction, duration). ❌ Old-stylesubtitlediv withBaseline: SHA → Target: SHA→ FAIL. ❌ Separaterisk-shiftdiv → FAIL (merged into comparison cards). - No duplicate status indicators — Status information (Fixed/New/Previously Unidentified counts) MUST appear in ONLY ONE place: the colored status summary cards. They MUST NOT also appear as small inline badges or text in the metrics bar. If the same counts appear in both the metrics bar AND colored cards → FAIL (remove from metrics bar, keep colored cards).
- Tier labels match analysis reports — The Threat Tier Distribution section in the HTML must use EXACTLY these labels: "Tier 1 — Direct Exposure", "Tier 2 — Conditional Risk", "Tier 3 — Defense-in-Depth". ❌ "Probable Exposure", "Theoretical", "High Risk", or any invented variant → FAIL.
- Section title is "Comparison Basis" not "Architecture Changes" — The component mapping section must be titled "Comparison Basis — Component Mapping", NOT "Architecture Changes".
- Heatmap has 13 columns — The STRIDE-A heatmap grid must have: Component | S | T | R | I | D | E | A | Total | divider | T1 | T2 | T3. If T1/T2/T3 columns are missing → FAIL. The heatmap title must include "(with Delta Indicators)".
8.2 Heatmap Accuracy (comparison outputs)
- Heatmap not all zeros — Sum all
baseline.Totalandcurrent.Totalinstride_heatmap.components. If either sum is 0 but corresponding inventory has threats → FAIL (heatmap computation bug). - No duplicate renamed component rows — For every entry in
components_diff.renamed, verify the heatmap has exactly ONE row for the renamed component (using current name), not TWO rows (one all-zero baseline, one all-zero current). - Heatmap anomaly detection executed — For every heatmap row with
baseline.Total > 0, current.Total == 0(disappeared) and every row withbaseline.Total == 0, current.Total > 0(appeared): verify that fingerprint cross-checking was performed. If a disappeared-appeared pair shares source files, class names, or namespace → it's a missed rename and must be reclassified. The heatmap should NOT have matching all-zero/all-new pairs with shared source files. - Comparison confidence score present —
diff-result.jsonmust containcomparison_confidencefield ("high" or "low"). If more than 3 unresolved heatmap anomalies exist → confidence must be "low" with warning banner in HTML. - Per-component STRIDE arithmetic — For each heatmap row:
S+T+R+I+D+E+A == TotalANDT1+T2+T3 == Totalfor both baseline and current. Any mismatch → FAIL. - Delta arrows match JSON data — For each heatmap cell,
delta = current - baseline. If delta == 0, no arrow. If delta > 0, ▲. If delta < 0, ▼. Spot-check at least 3 components. - Component removal source file verification — For every component in
components_diff.removed, verify itssource_filesare genuinely absent from the current commit. If source files still exist → reclassify as renamed or methodology gap.