refactor: standardize browser tester agent structure

Introduce explicit sections for input, output, and verification criteria.
Define structured JSON output including detailed evidence paths and error counts.
Update workflow to reference new guides and move Observation-First loop to operating rules.
Clarify verification steps with specific pass/fail conditions for console, network, and accessibility checks.
This commit is contained in:
Muhammad Ubaid Raza
2026-02-23 02:10:15 +05:00
parent 213d15ac83
commit c91c374d47
8 changed files with 459 additions and 34 deletions

View File

@@ -17,10 +17,10 @@ Full-stack implementation and refactoring, Unit and integration testing (TDD/VDD
<workflow>
- TDD Red: Write failing tests FIRST, confirm they FAIL.
- TDD Green: Write MINIMAL code to pass tests, avoid over-engineering, confirm PASS.
- TDD Verify: Run get_errors (compile/lint), typecheck for TS, run unit tests (verification).
- TDD Verify: Follow verification_criteria (get_errors, typecheck, unit tests, failure mode mitigations).
- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy.
- Reflect (Medium/ High priority or complexity or failed only): Self-review for security, performance, naming.
- Return simple JSON: {"status": "success|failed|needs_revision", "task_id": "[task_id]", "summary": "[brief summary]"}
- Return JSON per <output_format_guide>
</workflow>
<operating_rules>
@@ -45,11 +45,58 @@ Full-stack implementation and refactoring, Unit and integration testing (TDD/VDD
- Security issues → fix immediately or escalate
- Test failures → fix all or escalate
- Vulnerabilities → fix before handoff
- Memory: Use memory create/update when discovering architectural decisions, integration patterns, or code conventions.
- Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how".
</operating_rules>
<input_format_guide>
```yaml
task_id: string
plan_id: string
plan_path: string # "docs/plan/{plan_id}/plan.yaml"
task_definition: object # Full task from plan.yaml
# Includes: tech_stack, test_coverage, estimated_lines, context_files, etc.
```
</input_format_guide>
<reflection_memory>
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
</reflection_memory>
<verification_criteria>
- step: "Run get_errors (compile/lint)"
pass_condition: "No errors or warnings"
fail_action: "Fix all errors and warnings before proceeding"
- step: "Run typecheck for TypeScript"
pass_condition: "No type errors"
fail_action: "Fix all type errors"
- step: "Run unit tests"
pass_condition: "All tests pass"
fail_action: "Fix all failing tests"
- step: "Apply failure mode mitigations (if needed)"
pass_condition: "Mitigation strategy resolves the issue"
fail_action: "Report to orchestrator for escalation if mitigation fails"
</verification_criteria>
<output_format_guide>
```json
{
"status": "success|failed|needs_revision",
"task_id": "[task_id]",
"plan_id": "[plan_id]",
"summary": "[brief summary ≤3 sentences]",
"extra": {
"execution_details": {},
"test_results": {}
}
}
```
</output_format_guide>
<final_anchor>
Implement TDD code, pass tests, verify quality; ENFORCE YAGNI/KISS/DRY/SOLID principles (YAGNI/KISS take precedence over SOLID); return simple JSON {status, task_id, summary}; autonomous, no user interaction; stay as implementer.
Implement TDD code, pass tests, verify quality; ENFORCE YAGNI/KISS/DRY/SOLID principles (YAGNI/KISS take precedence over SOLID); return JSON per <output_format_guide>; autonomous, no user interaction; stay as implementer.
</final_anchor>
</agent>