mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-12 19:25:55 +00:00
161 lines
9.5 KiB
Markdown
161 lines
9.5 KiB
Markdown
# Writing the Quality Constitution (File 1: QUALITY.md)
|
||
|
||
The quality constitution defines what "quality" means for this specific project and makes the bar explicit, persistent, and inherited by every AI session.
|
||
|
||
## Template
|
||
|
||
```markdown
|
||
# Quality Constitution: [Project Name]
|
||
|
||
## Purpose
|
||
|
||
[2–3 paragraphs grounding quality in three principles:]
|
||
|
||
- **Deming** ("quality is built in, not inspected in") — Quality is built into context files
|
||
and the quality playbook so every AI session inherits the same bar.
|
||
- **Juran** ("fitness for use") — Define fitness specifically for this project. Not "tests pass"
|
||
but the actual real-world requirement. Example: "generates correct output that survives
|
||
input schema changes without silently producing wrong results."
|
||
- **Crosby** ("quality is free") — Building a quality playbook upfront costs less than
|
||
debugging problems found after deployment.
|
||
|
||
## Coverage Targets
|
||
|
||
| Subsystem | Target | Why |
|
||
|-----------|--------|-----|
|
||
| [Most fragile module] | 90–95% | [Real edge case or past bug] |
|
||
| [Core logic module] | 85–90% | [Concrete risk] |
|
||
| [I/O or integration layer] | 80% | [Explain] |
|
||
| [Configuration/utilities] | 75–80% | [Explain] |
|
||
|
||
The rationale column is essential. It must reference specific risks or past failures.
|
||
If you can't explain why a subsystem needs high coverage with a concrete example,
|
||
the target is arbitrary.
|
||
|
||
## Coverage Theater Prevention
|
||
|
||
[Define what constitutes a fake test for this project.]
|
||
|
||
Generic examples that apply to most projects:
|
||
- Asserting a function returned *something* without checking what
|
||
- Testing with synthetic data that lacks the quirks of real data
|
||
- Asserting an import succeeded
|
||
- Asserting mock returns what the mock was configured to return
|
||
- Calling a function and only asserting no exception was thrown
|
||
|
||
[Add project-specific examples based on what you learned during exploration.
|
||
For a data pipeline: "counting output records without checking their values."
|
||
For a web app: "checking HTTP 200 without checking the response body."
|
||
For a compiler: "checking output compiles without checking behavior."]
|
||
|
||
## Fitness-to-Purpose Scenarios
|
||
|
||
[5–10 scenarios. Every scenario must include a `[Req: tier — source]` tag linking it to its requirement source. Use the template below:]
|
||
|
||
### Scenario N: [Memorable Name]
|
||
|
||
**Requirement tag:** [Req: formal — Spec §X] *(or `user-confirmed` / `inferred` — see SKILL.md Phase 1, Step 1 for tier definitions)*
|
||
|
||
**What happened:** [The architectural vulnerability, edge case, or design decision.
|
||
Reference actual code — function names, file names, line numbers. Frame as "this architecture permits the following failure mode."]
|
||
|
||
**The requirement:** [What the code must do to prevent this failure.
|
||
Be specific enough that an AI can verify it.]
|
||
|
||
**How to verify:** [Concrete test or query that would fail if this regressed.
|
||
Include exact commands, test names, or assertions.]
|
||
|
||
---
|
||
|
||
[Repeat for each scenario]
|
||
|
||
## AI Session Quality Discipline
|
||
|
||
1. Read QUALITY.md before starting work.
|
||
2. Run the full test suite before marking any task complete.
|
||
3. Add tests for new functionality (not just happy path — include edge cases).
|
||
4. Update this file if new failure modes are discovered.
|
||
5. Output a Quality Compliance Checklist before ending a session.
|
||
6. Never remove a fitness-to-purpose scenario. Only add new ones.
|
||
|
||
## The Human Gate
|
||
|
||
[List things that require human judgment:]
|
||
- Output that "looks right" (requires domain knowledge)
|
||
- UX and responsiveness
|
||
- Documentation accuracy
|
||
- Security review of auth changes
|
||
- Backward compatibility decisions
|
||
```
|
||
|
||
## Where Scenarios Come From
|
||
|
||
Scenarios come from two sources — **code exploration** and **domain knowledge** — and the best scenarios combine both.
|
||
|
||
### Source 1: Defensive Code Patterns (Code Exploration)
|
||
|
||
Every defensive pattern is evidence of a past failure or known risk:
|
||
|
||
1. **Defensive code** — Every `if value is None: return` guard is a scenario. Why was it needed?
|
||
2. **Normalization functions** — Every function that cleans input exists because raw input caused problems
|
||
3. **Configuration that could be hardcoded** — If a value is read from config instead of hardcoded, someone learned the value varies
|
||
4. **Git blame / commit messages** — "Fix crash when X is missing" → Scenario: X can be missing
|
||
5. **Comments explaining "why"** — "We use hash(id) not sequential index because..." → Scenario about correctness under that constraint
|
||
|
||
### Source 2: What Could Go Wrong (Domain Knowledge)
|
||
|
||
Don't limit yourself to what the code already defends against. Use your knowledge of similar systems to generate realistic failure scenarios that the code **should** handle. For every major subsystem, ask:
|
||
|
||
- "What happens if this process is killed mid-operation?" (state machines, file I/O, batch processing)
|
||
- "What happens if external input is subtly wrong?" (validation pipelines, API integrations)
|
||
- "What happens if this runs at 10x scale?" (batch processing, databases, queues)
|
||
- "What happens if two operations overlap?" (concurrency, file locks, shared state)
|
||
- "What produces correct-looking output that is actually wrong?" (randomness, statistical operations, type coercion)
|
||
|
||
These are not hypothetical — they are things that happen to every system of this type. Write them as **architectural vulnerability analyses**: "Because `save_state()` lacks an atomic rename pattern, a mid-write crash during a 10,000-record batch will leave a corrupted state file — the next run gets JSONDecodeError and cannot resume without manual intervention. At scale (9,240 records across 64 batches), this pattern risks silent loss of 1,693+ records with nothing to flag them as missing." Concrete numbers and specific consequences make scenarios authoritative and non-negotiable. An AI session reading "records can be lost" will argue the standard down. An AI session reading a specific failure mode with quantified impact will not.
|
||
|
||
### The Narrative Voice
|
||
|
||
Each scenario's "What happened" must read like an architectural vulnerability analysis, not an abstract specification. Include:
|
||
|
||
- **Specific quantities** — "308 records across 64 batches" not "some records"
|
||
- **Cascade consequences** — "cascading through all subsequent pipeline steps, requiring reprocessing of 4,300 records instead of 308"
|
||
- **Detection difficulty** — "nothing would flag them as missing" or "only statistical verification would catch it"
|
||
- **Root cause in code** — "`random.seed(index)` creates correlated sequences because sequential integers produce related random streams"
|
||
|
||
The narrative voice serves a critical purpose: it makes standards non-negotiable. Abstract requirements ("records should not be lost") invite rationalization. Specific failure modes with quantified impact ("a mid-batch crash silently loses 1,693 records with no detection mechanism") do not. Frame these as "this architecture permits the following failure" — grounded in the actual code, not fabricated as past incidents.
|
||
|
||
### Combining Both Sources
|
||
|
||
The strongest scenarios combine a defensive pattern found in code with domain knowledge about why it matters:
|
||
|
||
1. Find the defensive code: `save_state()` writes to a temp file then renames
|
||
2. Ask what failure this prevents: mid-write crash leaves corrupted state file
|
||
3. Write the scenario as a vulnerability analysis: "Without the atomic rename pattern, a crash mid-write leaves state.json 50% complete. The next run gets JSONDecodeError and cannot resume without manual intervention."
|
||
4. Ground it in code: "Read persistence.py line ~340: verify temp file + rename pattern"
|
||
|
||
### The "Why" Requirement
|
||
|
||
Every coverage target, every quality gate, every standard must have a "why" that references a specific scenario or risk. Without rationale, a future AI session will optimize for speed and argue the standard down.
|
||
|
||
Bad: "Core logic: 100% coverage"
|
||
Good: "Core logic: 100% — because `random.seed(index)` created correlated sequences that produced 77.5% bias instead of 50/50. Subtle bugs here produce plausible-but-wrong output. Only statistical verification catches them."
|
||
|
||
The "why" is not documentation — it is protection against erosion.
|
||
|
||
## Calibrating Scenario Count
|
||
|
||
Aim for 2+ scenarios per core module (the modules identified as most complex or fragile). For a medium-sized project, this typically yields 8–10 scenarios. Fewer is fine for small projects; more for complex ones. If you're finding very few scenarios, it usually means the exploration was shallow rather than the project being simple — go back and read function bodies more carefully. Quality matters more than count: one scenario that precisely captures an architectural vulnerability is worth more than three generic "what if the input is bad" scenarios.
|
||
|
||
## Self-Critique Before Finishing
|
||
|
||
After drafting all scenarios, review each one and ask:
|
||
|
||
1. **"Would an AI session argue this standard down?"** If yes, the "why" isn't concrete enough. Add numbers, consequences, and detection difficulty.
|
||
2. **"Does the 'What happened' read like a vulnerability analysis or an abstract spec?"** If it reads like a spec, rewrite it with specific quantities, cascading consequences, and grounding in actual code.
|
||
3. **"Is there a scenario I'm not seeing?"** Think about what a different AI model would flag. Architecture models catch data flow problems. Edge-case models catch boundary conditions. What are you blind to?
|
||
|
||
## Critical Rule
|
||
|
||
Each scenario's "How to verify" section must map to at least one automated test in the functional test file. If a scenario can't be automated, note why (it may require the Human Gate) — but most scenarios should be testable.
|