Add quality-playbook skill (#1168)

This commit is contained in:
Andrew Stellman
2026-03-25 19:09:58 -04:00
committed by GitHub
parent 25205651c6
commit 50f87bdc13
10 changed files with 2081 additions and 0 deletions

View File

@@ -0,0 +1,589 @@
# Writing Functional Tests
This is the most important deliverable. The Markdown files are documentation. The functional test file is the automated safety net. Name it using the project's conventions: `test_functional.py` (Python/pytest), `FunctionalSpec.scala` (Scala/ScalaTest), `FunctionalTest.java` (Java/JUnit), `functional.test.ts` (TypeScript/Jest), `functional_test.go` (Go), etc.
## Structure: Three Test Groups
Organize tests into three logical groups using whatever structure the test framework provides — classes (Python/Java), describe blocks (TypeScript/Jest), traits (Scala), or subtests (Go):
```
Spec Requirements
— One test per testable spec section
— Each test's documentation cites the spec requirement
Fitness Scenarios
— One test per QUALITY.md scenario (1:1 mapping)
— Named to match: test_scenario_N_memorable_name (or equivalent convention)
Boundaries and Edge Cases
— One test per defensive pattern from Step 5
— Targets null guards, try/catch, normalization, fallbacks
```
## Test Count Heuristic
**Target = (testable spec sections) + (QUALITY.md scenarios) + (defensive patterns from Step 5)**
Example: 12 spec sections + 10 scenarios + 15 defensive patterns = 37 tests as a target.
For a medium-sized project (515 source files), this typically yields 3550 functional tests. Significantly fewer suggests missed requirements or shallow exploration. Don't pad to hit a number — every test should exercise real project code and verify a meaningful property.
## Import Pattern: Match the Existing Tests
Before writing any test code, read 23 existing test files and identify how they import project modules. This is critical — projects handle imports differently and getting it wrong means every test fails with resolution errors.
Common patterns by language:
**Python:**
- `sys.path.insert(0, "src/")` then bare imports (`from module import func`)
- Package imports (`from myproject.module import func`)
- Relative imports with conftest.py path manipulation
**Java:**
- `import com.example.project.Module;` matching the package structure
- Test source root must mirror main source root
**Scala:**
- `import com.example.project._` or `import com.example.project.{ClassA, ClassB}`
- SBT project layout: `src/test/scala/` mirrors `src/main/scala/`
**TypeScript/JavaScript:**
- `import { func } from '../src/module'` with relative paths
- Path aliases from `tsconfig.json` (e.g., `@/module`)
**Go:**
- Same package: test files in the same directory with `package mypackage`
- Black-box testing: `package mypackage_test` with explicit imports
- Internal packages may require specific import paths
**Rust:**
- `use crate::module::function;` for unit tests in the same crate
- `use myproject::module::function;` for integration tests in `tests/`
Whatever pattern the existing tests use, copy it exactly. Do not guess or invent a different pattern.
## Create Test Setup BEFORE Writing Tests
Every test framework has a mechanism for shared setup. If your tests use shared fixtures or test data, you MUST create the setup file before writing tests. Test frameworks do not auto-discover fixtures from other directories.
**By language:**
**Python (pytest):** Create `quality/conftest.py` defining every fixture. Fixtures in `tests/conftest.py` are NOT available to `quality/test_functional.py`. Preferred: write tests that create data inline using `tmp_path` to eliminate conftest dependency.
**Java (JUnit):** Use `@BeforeEach`/`@BeforeAll` methods in the test class, or create a shared `TestFixtures` utility class in the same package.
**Scala (ScalaTest):** Mix in a trait with `before`/`after` blocks, or use inline data builders. If using SBT, ensure the test file is in the correct source tree.
**TypeScript (Jest):** Use `beforeAll`/`beforeEach` in the test file, or create a `quality/testUtils.ts` with factory functions.
**Go (testing):** Helper functions in the same `_test.go` file with `t.Helper()`. Use `t.TempDir()` for temporary directories. Go convention strongly prefers inline setup — avoid shared test state.
**Rust (cargo test):** Helper functions in a `#[cfg(test)] mod tests` block or a `test_utils.rs` module. Use builder patterns for constructing test data. For integration tests, place files in `tests/`.
**Rule: Every fixture or test helper referenced must be defined.** If a test depends on shared setup that doesn't exist, the test will error during setup (not fail during assertion) — producing broken tests that look like they pass.
**Preferred approach across all languages:** Write tests that create their own data inline. This eliminates cross-file dependencies:
```python
# Python
def test_config_validation(tmp_path):
config = {"pipeline": {"name": "Test", "steps": [...]}}
```
```java
// Java
@Test
void testConfigValidation(@TempDir Path tempDir) {
var config = Map.of("pipeline", Map.of("name", "Test"));
}
```
```typescript
// TypeScript
test('config validation', () => {
const config = { pipeline: { name: 'Test', steps: [] } };
});
```
```go
// Go
func TestConfigValidation(t *testing.T) {
tmpDir := t.TempDir()
config := Config{Pipeline: Pipeline{Name: "Test"}}
}
```
```rust
// Rust
#[test]
fn test_config_validation() {
let config = Config { pipeline: Pipeline { name: "Test".into() } };
}
```
**After writing all tests, run the test suite and check for setup errors.** Setup errors (fixture not found, import failures) count as broken tests regardless of how the framework categorizes them.
## No Placeholder Tests
Every test must import and call actual project code. If a test body is `pass`, or its only assertion is `assert isinstance(errors, list)`, or it checks a trivial property like `assert hasattr(cls, 'validate')`, delete it and write a real test or drop it entirely. A test that doesn't exercise project code is worse than no test — it inflates the count and creates false confidence.
If you genuinely cannot write a meaningful test for a defensive pattern (e.g., it requires a running server or external service), note it as untestable in a comment rather than writing a placeholder.
## Read Before You Write: The Function Call Map
Before writing a single test, build a function call map. For every function you plan to test:
1. **Read the function/method signature** — not just the name, but every parameter, its type, and default value. In Python, read the `def` line and type hints. In Java, read the method signature and generics. In Scala, read the method definition and implicit parameters. In TypeScript, read the type annotations.
2. **Read the documentation** — docstrings, Javadoc, TSDoc, ScalaDoc. They often specify return types, exceptions, and edge case behavior.
3. **Read one existing test that calls it** — existing tests show you the exact calling convention, fixture shape, and assertion pattern.
4. **Read real data files** — if the function processes configs, schemas, or data files, read an actual file from the project. Your test fixtures must match this shape exactly.
**Common failure pattern:** The agent explores the architecture, understands conceptually what a function does, then writes a test call with guessed parameters. The test fails because the real function takes `(config, items_data, limit)` not `(items, seed, strategy)`. Reading the actual signature takes 5 seconds and prevents this entirely.
**Library version awareness:** Check the project's dependency manifest (`requirements.txt`, `build.sbt`, `package.json`, `pom.xml`, `build.gradle`, `Cargo.toml`) to verify what's available. Use the test framework's skip mechanism for optional dependencies: Python `pytest.importorskip()`, JUnit `Assumptions.assumeTrue()`, ScalaTest `assume()`, Jest conditional `describe.skip`, Go `t.Skip()`, Rust `#[ignore]` with a comment explaining the prerequisite.
## Writing Spec-Derived Tests
Walk each spec document section by section. For each section, ask: "What testable requirement does this state?" Then write a test.
Each test should:
1. **Set up** — Load a fixture, create test data, configure the system
2. **Execute** — Call the function, run the pipeline, make the request
3. **Assert specific properties** the spec requires
```python
# Python (pytest)
class TestSpecRequirements:
def test_requirement_from_spec_section_N(self, fixture):
"""[Req: formal — Design Doc §N] X should produce Y."""
result = process(fixture)
assert result.property == expected_value
```
```java
// Java (JUnit 5)
class SpecRequirementsTest {
@Test
@DisplayName("[Req: formal — Design Doc §N] X should produce Y")
void testRequirementFromSpecSectionN() {
var result = process(fixture);
assertEquals(expectedValue, result.getProperty());
}
}
```
```scala
// Scala (ScalaTest)
class SpecRequirements extends FlatSpec with Matchers {
// [Req: formal — Design Doc §N] X should produce Y
"Section N requirement" should "produce Y from X" in {
val result = process(fixture)
result.property should equal (expectedValue)
}
}
```
```typescript
// TypeScript (Jest)
describe('Spec Requirements', () => {
test('[Req: formal — Design Doc §N] X should produce Y', () => {
const result = process(fixture);
expect(result.property).toBe(expectedValue);
});
});
```
```go
// Go (testing)
func TestSpecRequirement_SectionN_XProducesY(t *testing.T) {
// [Req: formal — Design Doc §N] X should produce Y
result := Process(fixture)
if result.Property != expectedValue {
t.Errorf("expected %v, got %v", expectedValue, result.Property)
}
}
```
```rust
// Rust (cargo test)
#[test]
fn test_spec_requirement_section_n_x_produces_y() {
// [Req: formal — Design Doc §N] X should produce Y
let result = process(&fixture);
assert_eq!(result.property, expected_value);
}
```
## What Makes a Good Functional Test
- **Traceable** — Test name, display name, or documentation comment says which spec requirement it verifies
- **Specific** — Checks a specific property, not just "something happened"
- **Robust** — Uses real data (fixtures from the actual system), not synthetic data
- **Cross-variant** — If the project handles multiple input types, test all of them
- **Tests at the right layer** — Test the *behavior* you care about. If the requirement is "invalid data doesn't produce wrong output," test the pipeline output — don't just test that the schema validator rejects the input.
## Cross-Variant Testing Strategy
If the project handles multiple input types, cross-variant coverage is where silent bugs hide. Aim for roughly 30% of tests exercising all variants — the exact percentage matters less than ensuring every cross-cutting property is tested across all variants.
Use your framework's parametrization mechanism:
```python
# Python (pytest)
@pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c])
def test_feature_works(variant):
output = process(variant.input)
assert output.has_expected_property
```
```java
// Java (JUnit 5)
@ParameterizedTest
@MethodSource("variantProvider")
void testFeatureWorks(Variant variant) {
var output = process(variant.getInput());
assertTrue(output.hasExpectedProperty());
}
```
```scala
// Scala (ScalaTest)
Seq(variantA, variantB, variantC).foreach { variant =>
it should s"work for ${variant.name}" in {
val output = process(variant.input)
output should have ('expectedProperty (true))
}
}
```
```typescript
// TypeScript (Jest)
test.each([variantA, variantB, variantC])(
'feature works for %s', (variant) => {
const output = process(variant.input);
expect(output).toHaveProperty('expectedProperty');
});
```
```go
// Go (testing) — table-driven tests
func TestFeatureWorksAcrossVariants(t *testing.T) {
variants := []Variant{variantA, variantB, variantC}
for _, v := range variants {
t.Run(v.Name, func(t *testing.T) {
output := Process(v.Input)
if !output.HasExpectedProperty() {
t.Errorf("variant %s: missing expected property", v.Name)
}
})
}
}
```
```rust
// Rust (cargo test) — iterate over cases
#[test]
fn test_feature_works_across_variants() {
let variants = [variant_a(), variant_b(), variant_c()];
for v in &variants {
let output = process(&v.input);
assert!(output.has_expected_property(),
"variant {}: missing expected property", v.name);
}
}
```
If parametrization doesn't fit, loop explicitly within a single test.
**Which tests should be cross-variant?** Any test verifying a property that *should* hold regardless of input type: entity identity, structural properties, required links, temporal fields, domain-specific semantics.
**After writing all tests, do a cross-variant audit.** Count cross-variant tests divided by total. If below 30%, convert more.
## Anti-Patterns to Avoid
These patterns look like tests but don't catch real bugs:
- **Existence-only checks** — Finding one correct result doesn't mean all are correct. Also check count or verify comprehensively.
- **Presence-only assertions** — Asserting a value exists only proves presence, not correctness. Assert the actual value.
- **Single-variant testing** — Testing one input type and hoping others work. Use parametrization.
- **Positive-only testing** — You must test that invalid input does NOT produce bad output.
- **Incomplete negative assertions** — When testing rejection, assert ALL consequences are absent, not just one.
- **Catching exceptions instead of checking output** — Testing that code crashes in a specific way isn't testing that it handles input correctly. Test the output.
### The Exception-Catching Anti-Pattern in Detail
```java
// Java — WRONG: tests the validation mechanism
@Test
void testBadValueRejected() {
fixture.setField("invalid"); // Schema rejects this!
assertThrows(ValidationException.class, () -> process(fixture));
// Tells you nothing about output
}
// Java — RIGHT: tests the requirement
@Test
void testBadValueNotInOutput() {
fixture.setField(null); // Schema accepts null for Optional
var output = process(fixture);
assertFalse(output.contains(badProperty)); // Bad data absent
assertTrue(output.contains(expectedType)); // Rest still works
}
```
```scala
// Scala — WRONG: tests the decoder, not the requirement
"bad value" should "be rejected" in {
val input = fixture.copy(field = "invalid") // Circe decoder fails!
a [DecodingFailure] should be thrownBy process(input)
// Tells you nothing about output
}
// Scala — RIGHT: tests the requirement
"missing optional field" should "not produce bad output" in {
val input = fixture.copy(field = None) // Option[String] accepts None
val output = process(input)
output should not contain badProperty // Bad data absent
output should contain (expectedType) // Rest still works
}
```
```typescript
// TypeScript — WRONG: tests the validation mechanism
test('bad value rejected', () => {
fixture.field = 'invalid'; // Zod schema rejects this!
expect(() => process(fixture)).toThrow(ZodError);
// Tells you nothing about output
});
// TypeScript — RIGHT: tests the requirement
test('bad value not in output', () => {
fixture.field = undefined; // Schema accepts undefined for optional
const output = process(fixture);
expect(output).not.toContain(badProperty); // Bad data absent
expect(output).toContain(expectedType); // Rest still works
});
```
```python
# Python — WRONG: tests the validation mechanism
def test_bad_value_rejected(fixture):
fixture.field = "invalid" # Schema rejects this!
with pytest.raises(ValidationError):
process(fixture)
# Tells you nothing about output
# Python — RIGHT: tests the requirement
def test_bad_value_not_in_output(fixture):
fixture.field = None # Schema accepts None for Optional
output = process(fixture)
assert field_property not in output # Bad data absent
assert expected_type in output # Rest still works
```
```go
// Go — WRONG: tests the error, not the outcome
func TestBadValueRejected(t *testing.T) {
fixture.Field = "invalid" // Validator rejects this!
_, err := Process(fixture)
if err == nil { t.Fatal("expected error") }
// Tells you nothing about output
}
// Go — RIGHT: tests the requirement
func TestBadValueNotInOutput(t *testing.T) {
fixture.Field = "" // Zero value is valid
output, err := Process(fixture)
if err != nil { t.Fatalf("unexpected error: %v", err) }
if containsBadProperty(output) { t.Error("bad data should be absent") }
if !containsExpectedType(output) { t.Error("expected data should be present") }
}
```
```rust
// Rust — WRONG: tests the error, not the outcome
#[test]
fn test_bad_value_rejected() {
let input = Fixture { field: "invalid".into(), ..default() };
assert!(process(&input).is_err()); // Tells you nothing about output
}
// Rust — RIGHT: tests the requirement
#[test]
fn test_bad_value_not_in_output() {
let input = Fixture { field: None, ..default() }; // Option accepts None
let output = process(&input).expect("should succeed");
assert!(!output.contains(bad_property)); // Bad data absent
assert!(output.contains(expected_type)); // Rest still works
}
```
Always check your Step 5b schema map before choosing mutation values.
## Testing at the Right Layer
Ask: "What does the *spec* say should happen?" The spec says "invalid data should not appear in output" — not "validation layer should reject it." Test the spec, not the implementation.
**Exception:** When a spec explicitly mandates a specific mechanism (e.g., "must fail-fast at the schema layer"), testing that mechanism is appropriate. But this is rare.
## Fitness-to-Purpose Scenario Tests
For each scenario in QUALITY.md, write a test. This is a 1:1 mapping:
```scala
// Scala (ScalaTest)
class FitnessScenarios extends FlatSpec with Matchers {
// [Req: formal — QUALITY.md Scenario 1]
"Scenario 1: [Name]" should "prevent [failure mode]" in {
val result = process(fixture)
result.property should equal (expectedValue)
}
}
```
```python
# Python (pytest)
class TestFitnessScenarios:
"""Tests for fitness-to-purpose scenarios from QUALITY.md."""
def test_scenario_1_memorable_name(self, fixture):
"""[Req: formal — QUALITY.md Scenario 1] [Name].
Requirement: [What the code must do].
"""
result = process(fixture)
assert condition_that_prevents_the_failure
```
```java
// Java (JUnit 5)
class FitnessScenariosTest {
@Test
@DisplayName("[Req: formal — QUALITY.md Scenario 1] [Name]")
void testScenario1MemorableName() {
var result = process(fixture);
assertTrue(conditionThatPreventsFailure(result));
}
}
```
```typescript
// TypeScript (Jest)
describe('Fitness Scenarios', () => {
test('[Req: formal — QUALITY.md Scenario 1] [Name]', () => {
const result = process(fixture);
expect(conditionThatPreventsFailure(result)).toBe(true);
});
});
```
```go
// Go (testing)
func TestScenario1_MemorableName(t *testing.T) {
// [Req: formal — QUALITY.md Scenario 1] [Name]
// Requirement: [What the code must do]
result := Process(fixture)
if !conditionThatPreventsFailure(result) {
t.Error("scenario 1 failed: [describe expected behavior]")
}
}
```
```rust
// Rust (cargo test)
#[test]
fn test_scenario_1_memorable_name() {
// [Req: formal — QUALITY.md Scenario 1] [Name]
// Requirement: [What the code must do]
let result = process(&fixture);
assert!(condition_that_prevents_the_failure(&result));
}
```
## Boundary and Negative Tests
One test per defensive pattern from Step 5:
```typescript
// TypeScript (Jest)
describe('Boundaries and Edge Cases', () => {
test('[Req: inferred — from functionName() guard] guards against X', () => {
const input = { ...validFixture, field: null };
const result = process(input);
expect(result).not.toContainBadOutput();
});
});
```
```python
# Python (pytest)
class TestBoundariesAndEdgeCases:
"""Tests for boundary conditions, malformed input, error handling."""
def test_defensive_pattern_name(self, fixture):
"""[Req: inferred — from function_name() guard] guards against X."""
# Mutate to trigger defensive code path
# Assert graceful handling
```
```java
// Java (JUnit 5)
class BoundariesAndEdgeCasesTest {
@Test
@DisplayName("[Req: inferred — from methodName() guard] guards against X")
void testDefensivePatternName() {
fixture.setField(null); // Trigger defensive code path
var result = process(fixture);
assertNotNull(result); // Assert graceful handling
assertFalse(result.containsBadData());
}
}
```
```scala
// Scala (ScalaTest)
class BoundariesAndEdgeCases extends FlatSpec with Matchers {
// [Req: inferred — from methodName() guard]
"defensive pattern: methodName()" should "guard against X" in {
val input = fixture.copy(field = None) // Trigger defensive code path
val result = process(input)
result should equal (defined)
result.get should not contain badData
}
}
```
```go
// Go (testing)
func TestDefensivePattern_FunctionName_GuardsAgainstX(t *testing.T) {
// [Req: inferred — from FunctionName() guard] guards against X
input := defaultFixture()
input.Field = nil // Trigger defensive code path
result, err := Process(input)
if err != nil {
t.Fatalf("expected graceful handling, got: %v", err)
}
// Assert result is valid despite edge-case input
}
```
```rust
// Rust (cargo test)
#[test]
fn test_defensive_pattern_function_name_guards_against_x() {
// [Req: inferred — from function_name() guard] guards against X
let input = Fixture { field: None, ..default_fixture() };
let result = process(&input).expect("expected graceful handling");
// Assert result is valid despite edge-case input
}
```
Use your Step 5b schema map when choosing mutation values. Every mutation must use a value the schema accepts.
Systematic approach:
- **Missing fields** — Optional field absent? Set to null.
- **Wrong types** — Field gets different type? Use schema-valid alternative.
- **Empty values** — Empty list? Empty string? Empty dict?
- **Boundary values** — Zero, negative, maximum, first, last.
- **Cross-module boundaries** — Module A produces unusual but valid output — does B handle it?
If you found 10+ defensive patterns but wrote only 4 boundary tests, go back and write more. Target a 1:1 ratio.