# Writing Functional Tests This is the most important deliverable. The Markdown files are documentation. The functional test file is the automated safety net. Name it using the project's conventions: `test_functional.py` (Python/pytest), `FunctionalSpec.scala` (Scala/ScalaTest), `FunctionalTest.java` (Java/JUnit), `functional.test.ts` (TypeScript/Jest), `functional_test.go` (Go), etc. ## Structure: Three Test Groups Organize tests into three logical groups using whatever structure the test framework provides — classes (Python/Java), describe blocks (TypeScript/Jest), traits (Scala), or subtests (Go): ``` Spec Requirements — One test per testable spec section — Each test's documentation cites the spec requirement Fitness Scenarios — One test per QUALITY.md scenario (1:1 mapping) — Named to match: test_scenario_N_memorable_name (or equivalent convention) Boundaries and Edge Cases — One test per defensive pattern from Step 5 — Targets null guards, try/catch, normalization, fallbacks ``` ## Test Count Heuristic **Target = (testable spec sections) + (QUALITY.md scenarios) + (defensive patterns from Step 5)** Example: 12 spec sections + 10 scenarios + 15 defensive patterns = 37 tests as a target. For a medium-sized project (5–15 source files), this typically yields 35–50 functional tests. Significantly fewer suggests missed requirements or shallow exploration. Don't pad to hit a number — every test should exercise real project code and verify a meaningful property. ## Import Pattern: Match the Existing Tests Before writing any test code, read 2–3 existing test files and identify how they import project modules. This is critical — projects handle imports differently and getting it wrong means every test fails with resolution errors. Common patterns by language: **Python:** - `sys.path.insert(0, "src/")` then bare imports (`from module import func`) - Package imports (`from myproject.module import func`) - Relative imports with conftest.py path manipulation **Java:** - `import com.example.project.Module;` matching the package structure - Test source root must mirror main source root **Scala:** - `import com.example.project._` or `import com.example.project.{ClassA, ClassB}` - SBT project layout: `src/test/scala/` mirrors `src/main/scala/` **TypeScript/JavaScript:** - `import { func } from '../src/module'` with relative paths - Path aliases from `tsconfig.json` (e.g., `@/module`) **Go:** - Same package: test files in the same directory with `package mypackage` - Black-box testing: `package mypackage_test` with explicit imports - Internal packages may require specific import paths **Rust:** - `use crate::module::function;` for unit tests in the same crate - `use myproject::module::function;` for integration tests in `tests/` Whatever pattern the existing tests use, copy it exactly. Do not guess or invent a different pattern. ## Create Test Setup BEFORE Writing Tests Every test framework has a mechanism for shared setup. If your tests use shared fixtures or test data, you MUST create the setup file before writing tests. Test frameworks do not auto-discover fixtures from other directories. **By language:** **Python (pytest):** Create `quality/conftest.py` defining every fixture. Fixtures in `tests/conftest.py` are NOT available to `quality/test_functional.py`. Preferred: write tests that create data inline using `tmp_path` to eliminate conftest dependency. **Java (JUnit):** Use `@BeforeEach`/`@BeforeAll` methods in the test class, or create a shared `TestFixtures` utility class in the same package. **Scala (ScalaTest):** Mix in a trait with `before`/`after` blocks, or use inline data builders. If using SBT, ensure the test file is in the correct source tree. **TypeScript (Jest):** Use `beforeAll`/`beforeEach` in the test file, or create a `quality/testUtils.ts` with factory functions. **Go (testing):** Helper functions in the same `_test.go` file with `t.Helper()`. Use `t.TempDir()` for temporary directories. Go convention strongly prefers inline setup — avoid shared test state. **Rust (cargo test):** Helper functions in a `#[cfg(test)] mod tests` block or a `test_utils.rs` module. Use builder patterns for constructing test data. For integration tests, place files in `tests/`. **Rule: Every fixture or test helper referenced must be defined.** If a test depends on shared setup that doesn't exist, the test will error during setup (not fail during assertion) — producing broken tests that look like they pass. **Preferred approach across all languages:** Write tests that create their own data inline. This eliminates cross-file dependencies: ```python # Python def test_config_validation(tmp_path): config = {"pipeline": {"name": "Test", "steps": [...]}} ``` ```java // Java @Test void testConfigValidation(@TempDir Path tempDir) { var config = Map.of("pipeline", Map.of("name", "Test")); } ``` ```typescript // TypeScript test('config validation', () => { const config = { pipeline: { name: 'Test', steps: [] } }; }); ``` ```go // Go func TestConfigValidation(t *testing.T) { tmpDir := t.TempDir() config := Config{Pipeline: Pipeline{Name: "Test"}} } ``` ```rust // Rust #[test] fn test_config_validation() { let config = Config { pipeline: Pipeline { name: "Test".into() } }; } ``` **After writing all tests, run the test suite and check for setup errors.** Setup errors (fixture not found, import failures) count as broken tests regardless of how the framework categorizes them. ## No Placeholder Tests Every test must import and call actual project code. If a test body is `pass`, or its only assertion is `assert isinstance(errors, list)`, or it checks a trivial property like `assert hasattr(cls, 'validate')`, delete it and write a real test or drop it entirely. A test that doesn't exercise project code is worse than no test — it inflates the count and creates false confidence. If you genuinely cannot write a meaningful test for a defensive pattern (e.g., it requires a running server or external service), note it as untestable in a comment rather than writing a placeholder. ## Read Before You Write: The Function Call Map Before writing a single test, build a function call map. For every function you plan to test: 1. **Read the function/method signature** — not just the name, but every parameter, its type, and default value. In Python, read the `def` line and type hints. In Java, read the method signature and generics. In Scala, read the method definition and implicit parameters. In TypeScript, read the type annotations. 2. **Read the documentation** — docstrings, Javadoc, TSDoc, ScalaDoc. They often specify return types, exceptions, and edge case behavior. 3. **Read one existing test that calls it** — existing tests show you the exact calling convention, fixture shape, and assertion pattern. 4. **Read real data files** — if the function processes configs, schemas, or data files, read an actual file from the project. Your test fixtures must match this shape exactly. **Common failure pattern:** The agent explores the architecture, understands conceptually what a function does, then writes a test call with guessed parameters. The test fails because the real function takes `(config, items_data, limit)` not `(items, seed, strategy)`. Reading the actual signature takes 5 seconds and prevents this entirely. **Library version awareness:** Check the project's dependency manifest (`requirements.txt`, `build.sbt`, `package.json`, `pom.xml`, `build.gradle`, `Cargo.toml`) to verify what's available. Use the test framework's skip mechanism for optional dependencies: Python `pytest.importorskip()`, JUnit `Assumptions.assumeTrue()`, ScalaTest `assume()`, Jest conditional `describe.skip`, Go `t.Skip()`, Rust `#[ignore]` with a comment explaining the prerequisite. ## Writing Spec-Derived Tests Walk each spec document section by section. For each section, ask: "What testable requirement does this state?" Then write a test. Each test should: 1. **Set up** — Load a fixture, create test data, configure the system 2. **Execute** — Call the function, run the pipeline, make the request 3. **Assert specific properties** the spec requires ```python # Python (pytest) class TestSpecRequirements: def test_requirement_from_spec_section_N(self, fixture): """[Req: formal — Design Doc §N] X should produce Y.""" result = process(fixture) assert result.property == expected_value ``` ```java // Java (JUnit 5) class SpecRequirementsTest { @Test @DisplayName("[Req: formal — Design Doc §N] X should produce Y") void testRequirementFromSpecSectionN() { var result = process(fixture); assertEquals(expectedValue, result.getProperty()); } } ``` ```scala // Scala (ScalaTest) class SpecRequirements extends FlatSpec with Matchers { // [Req: formal — Design Doc §N] X should produce Y "Section N requirement" should "produce Y from X" in { val result = process(fixture) result.property should equal (expectedValue) } } ``` ```typescript // TypeScript (Jest) describe('Spec Requirements', () => { test('[Req: formal — Design Doc §N] X should produce Y', () => { const result = process(fixture); expect(result.property).toBe(expectedValue); }); }); ``` ```go // Go (testing) func TestSpecRequirement_SectionN_XProducesY(t *testing.T) { // [Req: formal — Design Doc §N] X should produce Y result := Process(fixture) if result.Property != expectedValue { t.Errorf("expected %v, got %v", expectedValue, result.Property) } } ``` ```rust // Rust (cargo test) #[test] fn test_spec_requirement_section_n_x_produces_y() { // [Req: formal — Design Doc §N] X should produce Y let result = process(&fixture); assert_eq!(result.property, expected_value); } ``` ## What Makes a Good Functional Test - **Traceable** — Test name, display name, or documentation comment says which spec requirement it verifies - **Specific** — Checks a specific property, not just "something happened" - **Robust** — Uses real data (fixtures from the actual system), not synthetic data - **Cross-variant** — If the project handles multiple input types, test all of them - **Tests at the right layer** — Test the *behavior* you care about. If the requirement is "invalid data doesn't produce wrong output," test the pipeline output — don't just test that the schema validator rejects the input. ## Cross-Variant Testing Strategy If the project handles multiple input types, cross-variant coverage is where silent bugs hide. Aim for roughly 30% of tests exercising all variants — the exact percentage matters less than ensuring every cross-cutting property is tested across all variants. Use your framework's parametrization mechanism: ```python # Python (pytest) @pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c]) def test_feature_works(variant): output = process(variant.input) assert output.has_expected_property ``` ```java // Java (JUnit 5) @ParameterizedTest @MethodSource("variantProvider") void testFeatureWorks(Variant variant) { var output = process(variant.getInput()); assertTrue(output.hasExpectedProperty()); } ``` ```scala // Scala (ScalaTest) Seq(variantA, variantB, variantC).foreach { variant => it should s"work for ${variant.name}" in { val output = process(variant.input) output should have ('expectedProperty (true)) } } ``` ```typescript // TypeScript (Jest) test.each([variantA, variantB, variantC])( 'feature works for %s', (variant) => { const output = process(variant.input); expect(output).toHaveProperty('expectedProperty'); }); ``` ```go // Go (testing) — table-driven tests func TestFeatureWorksAcrossVariants(t *testing.T) { variants := []Variant{variantA, variantB, variantC} for _, v := range variants { t.Run(v.Name, func(t *testing.T) { output := Process(v.Input) if !output.HasExpectedProperty() { t.Errorf("variant %s: missing expected property", v.Name) } }) } } ``` ```rust // Rust (cargo test) — iterate over cases #[test] fn test_feature_works_across_variants() { let variants = [variant_a(), variant_b(), variant_c()]; for v in &variants { let output = process(&v.input); assert!(output.has_expected_property(), "variant {}: missing expected property", v.name); } } ``` If parametrization doesn't fit, loop explicitly within a single test. **Which tests should be cross-variant?** Any test verifying a property that *should* hold regardless of input type: entity identity, structural properties, required links, temporal fields, domain-specific semantics. **After writing all tests, do a cross-variant audit.** Count cross-variant tests divided by total. If below 30%, convert more. ## Anti-Patterns to Avoid These patterns look like tests but don't catch real bugs: - **Existence-only checks** — Finding one correct result doesn't mean all are correct. Also check count or verify comprehensively. - **Presence-only assertions** — Asserting a value exists only proves presence, not correctness. Assert the actual value. - **Single-variant testing** — Testing one input type and hoping others work. Use parametrization. - **Positive-only testing** — You must test that invalid input does NOT produce bad output. - **Incomplete negative assertions** — When testing rejection, assert ALL consequences are absent, not just one. - **Catching exceptions instead of checking output** — Testing that code crashes in a specific way isn't testing that it handles input correctly. Test the output. ### The Exception-Catching Anti-Pattern in Detail ```java // Java — WRONG: tests the validation mechanism @Test void testBadValueRejected() { fixture.setField("invalid"); // Schema rejects this! assertThrows(ValidationException.class, () -> process(fixture)); // Tells you nothing about output } // Java — RIGHT: tests the requirement @Test void testBadValueNotInOutput() { fixture.setField(null); // Schema accepts null for Optional var output = process(fixture); assertFalse(output.contains(badProperty)); // Bad data absent assertTrue(output.contains(expectedType)); // Rest still works } ``` ```scala // Scala — WRONG: tests the decoder, not the requirement "bad value" should "be rejected" in { val input = fixture.copy(field = "invalid") // Circe decoder fails! a [DecodingFailure] should be thrownBy process(input) // Tells you nothing about output } // Scala — RIGHT: tests the requirement "missing optional field" should "not produce bad output" in { val input = fixture.copy(field = None) // Option[String] accepts None val output = process(input) output should not contain badProperty // Bad data absent output should contain (expectedType) // Rest still works } ``` ```typescript // TypeScript — WRONG: tests the validation mechanism test('bad value rejected', () => { fixture.field = 'invalid'; // Zod schema rejects this! expect(() => process(fixture)).toThrow(ZodError); // Tells you nothing about output }); // TypeScript — RIGHT: tests the requirement test('bad value not in output', () => { fixture.field = undefined; // Schema accepts undefined for optional const output = process(fixture); expect(output).not.toContain(badProperty); // Bad data absent expect(output).toContain(expectedType); // Rest still works }); ``` ```python # Python — WRONG: tests the validation mechanism def test_bad_value_rejected(fixture): fixture.field = "invalid" # Schema rejects this! with pytest.raises(ValidationError): process(fixture) # Tells you nothing about output # Python — RIGHT: tests the requirement def test_bad_value_not_in_output(fixture): fixture.field = None # Schema accepts None for Optional output = process(fixture) assert field_property not in output # Bad data absent assert expected_type in output # Rest still works ``` ```go // Go — WRONG: tests the error, not the outcome func TestBadValueRejected(t *testing.T) { fixture.Field = "invalid" // Validator rejects this! _, err := Process(fixture) if err == nil { t.Fatal("expected error") } // Tells you nothing about output } // Go — RIGHT: tests the requirement func TestBadValueNotInOutput(t *testing.T) { fixture.Field = "" // Zero value is valid output, err := Process(fixture) if err != nil { t.Fatalf("unexpected error: %v", err) } if containsBadProperty(output) { t.Error("bad data should be absent") } if !containsExpectedType(output) { t.Error("expected data should be present") } } ``` ```rust // Rust — WRONG: tests the error, not the outcome #[test] fn test_bad_value_rejected() { let input = Fixture { field: "invalid".into(), ..default() }; assert!(process(&input).is_err()); // Tells you nothing about output } // Rust — RIGHT: tests the requirement #[test] fn test_bad_value_not_in_output() { let input = Fixture { field: None, ..default() }; // Option accepts None let output = process(&input).expect("should succeed"); assert!(!output.contains(bad_property)); // Bad data absent assert!(output.contains(expected_type)); // Rest still works } ``` Always check your Step 5b schema map before choosing mutation values. ## Testing at the Right Layer Ask: "What does the *spec* say should happen?" The spec says "invalid data should not appear in output" — not "validation layer should reject it." Test the spec, not the implementation. **Exception:** When a spec explicitly mandates a specific mechanism (e.g., "must fail-fast at the schema layer"), testing that mechanism is appropriate. But this is rare. ## Fitness-to-Purpose Scenario Tests For each scenario in QUALITY.md, write a test. This is a 1:1 mapping: ```scala // Scala (ScalaTest) class FitnessScenarios extends FlatSpec with Matchers { // [Req: formal — QUALITY.md Scenario 1] "Scenario 1: [Name]" should "prevent [failure mode]" in { val result = process(fixture) result.property should equal (expectedValue) } } ``` ```python # Python (pytest) class TestFitnessScenarios: """Tests for fitness-to-purpose scenarios from QUALITY.md.""" def test_scenario_1_memorable_name(self, fixture): """[Req: formal — QUALITY.md Scenario 1] [Name]. Requirement: [What the code must do]. """ result = process(fixture) assert condition_that_prevents_the_failure ``` ```java // Java (JUnit 5) class FitnessScenariosTest { @Test @DisplayName("[Req: formal — QUALITY.md Scenario 1] [Name]") void testScenario1MemorableName() { var result = process(fixture); assertTrue(conditionThatPreventsFailure(result)); } } ``` ```typescript // TypeScript (Jest) describe('Fitness Scenarios', () => { test('[Req: formal — QUALITY.md Scenario 1] [Name]', () => { const result = process(fixture); expect(conditionThatPreventsFailure(result)).toBe(true); }); }); ``` ```go // Go (testing) func TestScenario1_MemorableName(t *testing.T) { // [Req: formal — QUALITY.md Scenario 1] [Name] // Requirement: [What the code must do] result := Process(fixture) if !conditionThatPreventsFailure(result) { t.Error("scenario 1 failed: [describe expected behavior]") } } ``` ```rust // Rust (cargo test) #[test] fn test_scenario_1_memorable_name() { // [Req: formal — QUALITY.md Scenario 1] [Name] // Requirement: [What the code must do] let result = process(&fixture); assert!(condition_that_prevents_the_failure(&result)); } ``` ## Boundary and Negative Tests One test per defensive pattern from Step 5: ```typescript // TypeScript (Jest) describe('Boundaries and Edge Cases', () => { test('[Req: inferred — from functionName() guard] guards against X', () => { const input = { ...validFixture, field: null }; const result = process(input); expect(result).not.toContainBadOutput(); }); }); ``` ```python # Python (pytest) class TestBoundariesAndEdgeCases: """Tests for boundary conditions, malformed input, error handling.""" def test_defensive_pattern_name(self, fixture): """[Req: inferred — from function_name() guard] guards against X.""" # Mutate to trigger defensive code path # Assert graceful handling ``` ```java // Java (JUnit 5) class BoundariesAndEdgeCasesTest { @Test @DisplayName("[Req: inferred — from methodName() guard] guards against X") void testDefensivePatternName() { fixture.setField(null); // Trigger defensive code path var result = process(fixture); assertNotNull(result); // Assert graceful handling assertFalse(result.containsBadData()); } } ``` ```scala // Scala (ScalaTest) class BoundariesAndEdgeCases extends FlatSpec with Matchers { // [Req: inferred — from methodName() guard] "defensive pattern: methodName()" should "guard against X" in { val input = fixture.copy(field = None) // Trigger defensive code path val result = process(input) result should equal (defined) result.get should not contain badData } } ``` ```go // Go (testing) func TestDefensivePattern_FunctionName_GuardsAgainstX(t *testing.T) { // [Req: inferred — from FunctionName() guard] guards against X input := defaultFixture() input.Field = nil // Trigger defensive code path result, err := Process(input) if err != nil { t.Fatalf("expected graceful handling, got: %v", err) } // Assert result is valid despite edge-case input } ``` ```rust // Rust (cargo test) #[test] fn test_defensive_pattern_function_name_guards_against_x() { // [Req: inferred — from function_name() guard] guards against X let input = Fixture { field: None, ..default_fixture() }; let result = process(&input).expect("expected graceful handling"); // Assert result is valid despite edge-case input } ``` Use your Step 5b schema map when choosing mutation values. Every mutation must use a value the schema accepts. Systematic approach: - **Missing fields** — Optional field absent? Set to null. - **Wrong types** — Field gets different type? Use schema-valid alternative. - **Empty values** — Empty list? Empty string? Empty dict? - **Boundary values** — Zero, negative, maximum, first, last. - **Cross-module boundaries** — Module A produces unusual but valid output — does B handle it? If you found 10+ defensive patterns but wrote only 4 boundary tests, go back and write more. Target a 1:1 ratio.