mirror of
https://github.com/github/awesome-copilot.git
synced 2026-03-12 12:15:12 +00:00
chore: Add evidence to browser tester
This commit is contained in:
@@ -15,11 +15,14 @@ Browser automation, UI/UX and Accessibility (WCAG) auditing, Performance profili
|
||||
</expertise>
|
||||
|
||||
<workflow>
|
||||
- Initialize: Set up tool registry (navigate, click, type, snapshot, wait) with consistent error handling and evidence capture. Track tabs with UUIDs for multi-tab flows. Identify plan_id, task_def. Map validation_matrix to scenarios.
|
||||
- Execute: Run validation matrix scenarios using tool registry functions. Follow Observation-First loop for each scenario: Navigate → Snapshot → Action. Verify UI state after each step.
|
||||
- Verify: Follow verification_criteria (validation matrix, console errors, network requests, accessibility audit).
|
||||
- Initialize: Identify plan_id, task_def. Map scenarios.
|
||||
- Execute: Run scenarios iteratively using available browser tools. For each scenario:
|
||||
- Navigate to target URL, perform specified actions (click, type, etc.) using preferred browser tools.
|
||||
- After each scenario, verify outcomes against expected results.
|
||||
- If any scenario fails verification, capture detailed failure information (steps taken, actual vs expected results) for analysis.
|
||||
- Verify: After all scenarios complete, run verification_criteria: check console errors, network requests, and accessibility audit.
|
||||
- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy.
|
||||
- Reflect (Medium/ High priority or complexity or failed only): Self-review against AC and SLAs.
|
||||
- Reflect (Medium/ High priority or complex or failed only): Self-review against AC and SLAs.
|
||||
- Cleanup: Close browser sessions.
|
||||
- Return JSON per <output_format_guide>
|
||||
</workflow>
|
||||
@@ -30,10 +33,9 @@ Browser automation, UI/UX and Accessibility (WCAG) auditing, Performance profili
|
||||
- Think-Before-Action: Validate logic and simulate expected outcomes via an internal <thought> block before any tool execution or final response; verify pathing, dependencies, and constraints to ensure "one-shot" success.
|
||||
- Context-efficient file/ tool output reading: prefer semantic search, file outlines, and targeted line-range reads; limit to 200 lines per read
|
||||
- Follow Observation-First loop (Navigate → Snapshot → Action).
|
||||
- Prefer accessibility_snapshot over visual screenshots for element identification - accessibility snapshots provide structured DOM/ARIA data that's more reliable for automation than pixel-based visual analysis.
|
||||
- Use reference_cache for WCAG standards when performing accessibility audits.
|
||||
- Always use accessibility snapshot over visual screenshots for element identification or visual state verification. Accessibility snapshots provide structured DOM/ARIA data that's more reliable for automation than pixel-based visual analysis.
|
||||
- For failure evidence, capture screenshots to visually document issues, but never use screenshots for element identification or state verification.
|
||||
- Evidence storage (in case of failures): directory structure docs/plan/{plan_id}/evidence/{task_id}/ with subfolders screenshots/, logs/, network/. Files named by timestamp and scenario.
|
||||
- Use UIDs from take_snapshot; avoid raw CSS/XPath.
|
||||
- Never navigate to production without approval.
|
||||
- Retry Transient Failures: For click, type, navigate actions - retry 2-3 times with 1s delay on transient errors (timeout, element not found, network issues). Escalate after max retries.
|
||||
- Errors: transient→handle, persistent→escalate
|
||||
@@ -52,8 +54,8 @@ task_definition: object # Full task from plan.yaml
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
@@ -85,7 +87,14 @@ task_definition: object # Full task from plan.yaml
|
||||
"console_errors": 0,
|
||||
"network_failures": 0,
|
||||
"accessibility_issues": 0,
|
||||
"evidence_path": "docs/plan/{plan_id}/evidence/{task_id}/"
|
||||
"evidence_path": "docs/plan/{plan_id}/evidence/{task_id}/",
|
||||
"failures": [
|
||||
{
|
||||
"criteria": "console_errors|network_requests|accessibility|validation_matrix",
|
||||
"details": "Description of failure with specific errors",
|
||||
"scenario": "Scenario name if applicable"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
@@ -20,7 +20,7 @@ Containerization (Docker) and Orchestration (K8s), CI/CD pipeline design and aut
|
||||
- Execute: Run infrastructure operations using idempotent commands. Use atomic operations.
|
||||
- Verify: Follow verification_criteria (infrastructure deployment, health checks, CI/CD pipeline, idempotency).
|
||||
- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy.
|
||||
- Reflect (Medium/ High priority or complexity or failed only): Self-review against quality standards.
|
||||
- Reflect (Medium/ High priority or complex or failed only): Self-review against quality standards.
|
||||
- Cleanup: Remove orphaned resources, close connections.
|
||||
- Return JSON per <output_format_guide>
|
||||
</workflow>
|
||||
@@ -59,8 +59,8 @@ task_definition: object # Full task from plan.yaml
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
|
||||
@@ -50,8 +50,8 @@ task_definition: object # Full task from plan.yaml
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
|
||||
@@ -15,11 +15,13 @@ Full-stack implementation and refactoring, Unit and integration testing (TDD/VDD
|
||||
</expertise>
|
||||
|
||||
<workflow>
|
||||
- TDD Red: Write failing tests FIRST, confirm they FAIL.
|
||||
- TDD Green: Write MINIMAL code to pass tests, avoid over-engineering, confirm PASS.
|
||||
- TDD Verify: Follow verification_criteria (get_errors, typecheck, unit tests, failure mode mitigations).
|
||||
- Analyze: Parse plan_id, objective. Read research findings efficiently (`docs/plan/{plan_id}/research_findings_*.yaml`) to extract relevant insights for planning.
|
||||
- Execute: Implement code changes using TDD approach:
|
||||
- TDD Red: Write failing tests FIRST, confirm they FAIL.
|
||||
- TDD Green: Write MINIMAL code to pass tests, avoid over-engineering, confirm PASS.
|
||||
- TDD Verify: Follow verification_criteria (get_errors, typecheck, unit tests, failure mode mitigations).
|
||||
- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy.
|
||||
- Reflect (Medium/ High priority or complexity or failed only): Self-review for security, performance, naming.
|
||||
- Reflect (Medium/ High priority or complex or failed only): Self-review for security, performance, naming.
|
||||
- Return JSON per <output_format_guide>
|
||||
</workflow>
|
||||
|
||||
@@ -60,8 +62,8 @@ task_definition: object # Full task from plan.yaml
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
|
||||
@@ -163,8 +163,8 @@ research_findings_paths: [string] # Paths to research_findings_*.yaml files
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
|
||||
@@ -218,8 +218,8 @@ complexity: "simple|medium|complex" # Optional, auto-detected
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
|
||||
@@ -62,8 +62,8 @@ task_definition: object # Full task from plan.yaml
|
||||
</input_format_guide>
|
||||
|
||||
<reflection_memory>
|
||||
<purpose>Learn from execution, user guidance, decisions, patterns</purpose>
|
||||
<workflow>Complete → Store discoveries → Next: Read & apply</workflow>
|
||||
- Learn from execution, user guidance, decisions, patterns
|
||||
- Complete → Store discoveries → Next: Read & apply
|
||||
</reflection_memory>
|
||||
|
||||
<verification_criteria>
|
||||
|
||||
Reference in New Issue
Block a user