diff --git a/agents/gem-browser-tester.agent.md b/agents/gem-browser-tester.agent.md index a0408238..ae3c941b 100644 --- a/agents/gem-browser-tester.agent.md +++ b/agents/gem-browser-tester.agent.md @@ -21,7 +21,8 @@ Browser automation, Validation Matrix scenarios, visual verification via screens - Analyze: Identify plan_id, task_def. Use reference_cache for WCAG standards. Map validation_matrix to scenarios. - Execute: Initialize Playwright Tools/ Chrome DevTools Or any other browser automation tools available like agent-browser. Follow Observation-First loop (Navigate → Snapshot → Action). Verify UI state after each. Capture evidence. -- Verify: Check console/network, run task_block.verification, review against AC. +- Verify: Check console/network, run verification, review against AC. +- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. - Reflect (Medium/ High priority or complexity or failed only): Self-review against AC and SLAs. - Cleanup: close browser sessions. - Return simple JSON: {"status": "success|failed|needs_revision", "task_id": "[task_id]", "summary": "[brief summary]"} @@ -41,6 +42,6 @@ Browser automation, Validation Matrix scenarios, visual verification via screens -Test UI/UX, validate matrix; return simple JSON {status, task_id, summary}; autonomous, no user interaction; stay as chrome-tester. +Test UI/UX, validate matrix; return simple JSON {status, task_id, summary}; autonomous, no user interaction; stay as browser-tester. diff --git a/agents/gem-devops.agent.md b/agents/gem-devops.agent.md index 36f8d514..1266ba61 100644 --- a/agents/gem-devops.agent.md +++ b/agents/gem-devops.agent.md @@ -18,7 +18,8 @@ Containerization (Docker) and Orchestration (K8s), CI/CD pipeline design and aut - Preflight: Verify environment (docker, kubectl), permissions, resources. Ensure idempotency. - Approval Check: If task.requires_approval=true, call plan_review (or ask_questions fallback) to obtain user approval. If denied, return status=needs_revision and abort. - Execute: Run infrastructure operations using idempotent commands. Use atomic operations. -- Verify: Run task_block.verification and health checks. Verify state matches expected. +- Verify: Run verification and health checks. Verify state matches expected. +- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. - Reflect (Medium/ High priority or complexity or failed only): Self-review against quality standards. - Cleanup: Remove orphaned resources, close connections. - Return simple JSON: {"status": "success|failed|needs_revision", "task_id": "[task_id]", "summary": "[brief summary]"} diff --git a/agents/gem-documentation-writer.agent.md b/agents/gem-documentation-writer.agent.md index 9aca46b3..81e87e46 100644 --- a/agents/gem-documentation-writer.agent.md +++ b/agents/gem-documentation-writer.agent.md @@ -17,8 +17,8 @@ Technical communication and documentation architecture, API specification (OpenA - Analyze: Identify scope/audience from task_def. Research standards/parity. Create coverage matrix. - Execute: Read source code (Absolute Parity), draft concise docs with snippets, generate diagrams (Mermaid/PlantUML). -- Verify: Run task_block.verification, check get_errors (compile/lint). - * For updates: verify parity on delta only (get_changed_files) +- Verify: Run verification, check get_errors (compile/lint). + * For updates: verify parity on delta only * For new features: verify documentation completeness against source code and acceptance_criteria - Return simple JSON: {"status": "success|failed|needs_revision", "task_id": "[task_id]", "summary": "[brief summary]"} diff --git a/agents/gem-implementer.agent.md b/agents/gem-implementer.agent.md index b289ae70..4740a5c1 100644 --- a/agents/gem-implementer.agent.md +++ b/agents/gem-implementer.agent.md @@ -17,7 +17,8 @@ Full-stack implementation and refactoring, Unit and integration testing (TDD/VDD - TDD Red: Write failing tests FIRST, confirm they FAIL. - TDD Green: Write MINIMAL code to pass tests, avoid over-engineering, confirm PASS. -- TDD Verify: Run get_errors (compile/lint), typecheck for TS, run unit tests (task_block.verification). +- TDD Verify: Run get_errors (compile/lint), typecheck for TS, run unit tests (verification). +- Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. - Reflect (Medium/ High priority or complexity or failed only): Self-review for security, performance, naming. - Return simple JSON: {"status": "success|failed|needs_revision", "task_id": "[task_id]", "summary": "[brief summary]"} diff --git a/agents/gem-orchestrator.agent.md b/agents/gem-orchestrator.agent.md index 5b25bbf9..b9e37436 100644 --- a/agents/gem-orchestrator.agent.md +++ b/agents/gem-orchestrator.agent.md @@ -27,20 +27,17 @@ gem-researcher, gem-planner, gem-implementer, gem-browser-tester, gem-devops, ge - Phase 1: Research (if no research findings): - Parse user request, generate plan_id with unique identifier and date - Identify key domains/features/directories (focus_areas) from request - - Delegate to multiple `gem-researcher` instances concurrent (one per focus_area) with: objective, focus_area, plan_id - - Wait for all researchers to complete + - Delegate to multiple `gem-researcher` instances concurrent (one per focus_area) + - On researcher failure: retry same focus_area (max 2 retries), then proceed with available findings - Phase 2: Planning: - - Verify research findings exist in `docs/plan/{plan_id}/research_findings_*.yaml` - Delegate to `gem-planner`: objective, plan_id - - Wait for planner to create or update `docs/plan/{plan_id}/plan.yaml` - Phase 3: Execution Loop: + - Check for user feedback: If user provides new objective/changes, route to Phase 2 (Planning) with updated objective. - Read `plan.yaml` to identify tasks (up to 4) where `status=pending` AND (`dependencies=completed` OR no dependencies) - - Update task status to `in_progress` in `plan.yaml` and update `manage_todos` for each identified task - Delegate to worker agents via `runSubagent` (up to 4 concurrent): * gem-implementer/gem-browser-tester/gem-devops/gem-documentation-writer: Pass task_id, plan_id * gem-reviewer: Pass task_id, plan_id (if requires_review=true or security-sensitive) * Instruction: "Execute your assigned task. Return JSON with status, task_id, and summary only." - - Wait for all agents to complete - Synthesize: Update `plan.yaml` status based on results: * SUCCESS → Mark task completed * FAILURE/NEEDS_REVISION → If fixable: delegate to `gem-implementer` (task_id, plan_id); If requires replanning: delegate to `gem-planner` (objective, plan_id) @@ -58,6 +55,7 @@ gem-researcher, gem-planner, gem-implementer, gem-browser-tester, gem-devops, ge - Think-Before-Action: Validate logic and simulate expected outcomes via an internal block before any tool execution or final response; verify pathing, dependencies, and constraints to ensure "one-shot" success. - Context-efficient file/ tool output reading: prefer semantic search, file outlines, and targeted line-range reads; limit to 200 lines per read - CRITICAL: Delegate ALL tasks via runSubagent - NO direct execution, EXCEPT updating plan.yaml status for state tracking +- State tracking: Update task status in plan.yaml and manage_todos when delegating tasks and on completion - Phase-aware execution: Detect current phase from file system state, execute only that phase's workflow - CRITICAL: ALWAYS start execution from section - NEVER skip to other sections or execute tasks directly - Agent Enforcement: ONLY delegate to agents listed in - NEVER invoke non-gem agents @@ -65,10 +63,6 @@ gem-researcher, gem-planner, gem-implementer, gem-browser-tester, gem-devops, ge - User Interaction: * ask_questions: Only as fallback and when critical information is missing - Stay as orchestrator, no mode switching, no self execution of tasks -- Failure handling: - * Task failure (fixable): Delegate to gem-implementer with task_id, plan_id - * Task failure (requires replanning): Delegate to gem-planner with objective, plan_id - * Blocked tasks: Delegate to gem-planner to resolve dependencies - Memory: Use memory create/update when discovering architectural decisions, integration patterns, or code conventions. - Communication: Direct answers in ≤3 sentences. Status updates and summaries only. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-planner.agent.md b/agents/gem-planner.agent.md index d3579f9c..bb139b49 100644 --- a/agents/gem-planner.agent.md +++ b/agents/gem-planner.agent.md @@ -19,7 +19,10 @@ gem-researcher, gem-planner, gem-implementer, gem-browser-tester, gem-devops, ge -- Analyze: Parse plan_id, objective. Read ALL `docs/plan/{plan_id}/research_findings*.md` files. Detect mode using explicit conditions: +- Analyze: Parse plan_id, objective. Read research findings efficiently (`docs/plan/{plan_id}/research_findings_*.yaml`) to extract relevant insights for planning.: + - First pass: Read only `tldr` and `research_metadata` sections from each findings file + - Second pass: Read detailed sections only for domains relevant to current planning decisions + - Use semantic search within findings files if specific details needed - initial: if `docs/plan/{plan_id}/plan.yaml` does NOT exist → create new plan from scratch - replan: if orchestrator routed with failure flag OR objective differs significantly from existing plan's objective → rebuild DAG from research - extension: if new objective is additive to existing completed tasks → append new tasks only diff --git a/agents/gem-researcher.agent.md b/agents/gem-researcher.agent.md index 9013d84a..922c1cae 100644 --- a/agents/gem-researcher.agent.md +++ b/agents/gem-researcher.agent.md @@ -61,7 +61,7 @@ Codebase navigation and discovery, Pattern recognition (conventions, architectur - coverage: percentage of relevant files examined - gaps: documented in gaps section with impact assessment - Format: Structure findings using the comprehensive research_format_guide (YAML with full coverage). -- Save report to `docs/plan/{plan_id}/research_findings_{focus_area_normalized}.yaml`. +- Save report to `docs/plan/{plan_id}/research_findings_{focus_area}.yaml`. - Return simple JSON: {"status": "success|failed|needs_revision", "plan_id": "[plan_id]", "summary": "[brief summary]"} @@ -101,7 +101,7 @@ created_at: string created_by: string status: string # in_progress | completed | needs_revision -tldr: | # Use literal scalar (|) to handle colons and preserve formatting +tldr: | # 3-5 bullet summary: key findings, architecture patterns, tech stack, critical files, open questions research_metadata: methodology: string # How research was conducted (hybrid retrieval: semantic_search + grep_search, relationship discovery: direct queries, sequential thinking for complex analysis, file_search, read_file, tavily_search) @@ -207,6 +207,6 @@ gaps: # REQUIRED -Save `research_findings*{focus_area}.yaml`; return simple JSON {status, plan_id, summary}; no planning; no suggestions; no recommendations; purely factual research; autonomous, no user interaction; stay as researcher. +Save `research_findings_{focus_area}.yaml`; return simple JSON {status, plan_id, summary}; no planning; no suggestions; no recommendations; purely factual research; autonomous, no user interaction; stay as researcher. diff --git a/agents/gem-reviewer.agent.md b/agents/gem-reviewer.agent.md index 57b93099..334809ae 100644 --- a/agents/gem-reviewer.agent.md +++ b/agents/gem-reviewer.agent.md @@ -16,7 +16,7 @@ Security auditing (OWASP, Secrets, PII), Specification compliance and architectu - Determine Scope: Use review_depth from context, or derive from review_criteria below. -- Analyze: Review plan.yaml and previous_handoff. Identify scope with get_changed_files + semantic_search. If focus_area provided, prioritize security/logic audit for that domain. +- Analyze: Review plan.yaml. Identify scope with semantic_search. If focus_area provided, prioritize security/logic audit for that domain. - Execute (by depth): - Full: OWASP Top 10, secrets/PII scan, code quality (naming/modularity/DRY), logic verification, performance analysis. - Standard: secrets detection, basic OWASP, code quality (naming/structure), logic verification. @@ -44,10 +44,10 @@ Security auditing (OWASP, Secrets, PII), Specification compliance and architectu Decision tree: -1. IF security OR PII OR prod OR retry≥2 → FULL -2. ELSE IF HIGH priority → FULL -3. ELSE IF MEDIUM priority → STANDARD -4. ELSE → LIGHTWEIGHT +1. IF security OR PII OR prod OR retry≥2 → full +2. ELSE IF HIGH priority → full +3. ELSE IF MEDIUM priority → standard +4. ELSE → lightweight