Files
Muhammad Ubaid Raza 46bef1b61a [gem-team] Introduce specialized skills and guidelines to agents (#1271)
* feat(orchestrator): add Discuss Phase and PRD creation workflow

- Introduce Discuss Phase for medium/complex objectives, generating context‑aware options and logging architectural decisions
- Add PRD creation step after discussion, storing the PRD in docs/prd.yaml
- Refactor Phase 1 to pass task clarifications to researchers
- Update Phase 2 planning to include multi‑plan selection for complex tasks and verification with gem‑reviewer
- Enhance Phase 3 execution loop with wave integration checks and conflict filtering

* feat(gem-team): bump version to 1.3.3 and refine description with Discuss Phase and PRD compliance verification

* chore(release): bump marketplace version to 1.3.4

- Update `marketplace.json` version from `1.3.3` to `1.3.4`.
- Refine `gem-browser-tester.agent.md`:
  - Replace "UUIDs" typo with correct spelling.
  - Adjust wording and formatting for clarity.
  - Update JSON code fences to use ````jsonc````.
  - Modify workflow description to reference `AGENTS.md` when present.
- Refine `gem-devops.agent.md`:
  - Align expertise list formatting.
  - Standardize tool list syntax with back‑ticks.
  - Minor wording improvements.
- Increase retry attempts in `gem-browser-tester.agent.md` from 2 to 3 attempts.
- Minor typographical and formatting corrections across agent documentation.

* refactor: rename prd_path to project_prd_path in agent configurations

- Updated gem-orchestrator.agent.md to use `project_prd_path` instead of `prd_path` in task definitions and delegation logic.
- Updated gem-planner.agent.md to reference `project_prd_path` and clarify PRD reading.
- Updated gem-researcher.agent.md to use `project_prd_path` and adjust PRD consumption logic.
- Applied minor wording improvements and consistency fixes across the orchestrator, planner, and researcher documentation.

* feat(plugin): expand marketplace description, bump version to 1.4.0; revamp gem-browser-tester agent documentation with clearer role, expertise, and workflow specifications.

* chore: remove outdated plugin metadata fields from README.plugins.md and plugin.json

* feat(tooling): bump marketplace version to 1.5.0 and refine validation thresholds

- Update marketplace.json version from 1.4.0 to 1.5.0
- Adjust validation criteria in gem-browser-tester.agent.md to trigger additional tests when coverage < 0.85 or confidence < 0.85
- Refine accessibility compliance description, adding runtime validation and SPEC‑based accessibility notes- Add new gem-code-simplifier.agent.md documentation for code refactoring
- Update README and plugin metadata to reflect version change and new tooling

* docs: improve bug‑fix delegation description and delegation‑first guidance in gem‑orchestrator.agent.md

- Clarified the two‑step diagnostic‑then‑fix flow for bug fixes using gem‑debugger and gem‑implementer.
- Updated the “Delegation First” checklist to stress that **no** task, however small, should be performed directly by the orchestrator, emphasizing sub‑agent delegation and retry/escalation strategy.

* feat(gem-browser-tester): add flow testing support and refine workflow

- Update description to include “flow testing” and “user journey” among triggers.
- Expand expertise list to cover flow testing and visual regression.
- Revise knowledge sources and workflow to detail initialization, setup, flow execution, and teardown.
- Introduce comprehensive step types (navigate, interact, assert, branch, extract, wait, screenshot) with explicit wait strategies.
- Implement baseline screenshot comparison for visual regression.
- Restructure execution pattern to manage flow context and multi‑step user journeys.

* feat: add performance, design, responsive checks

* feat(styling): add priority-based styling hierarchy and validation rules

* feat: incorporate lint rule recommendations and update agent routing for ESLint rule handling

* chore(release): bump marketplace version to 1.5.4

* docs: Simplify readme

* chore: Add mobile specific agents and disable user invocation flags

* feat(architecture): add mobile agents and refactor diagram

* feat(readme): add recommended LLM column to agent team roles

* docs: Update readme

---------

Co-authored-by: Aaron Powell <me@aaron-powell.com>
2026-04-09 12:17:20 +10:00
..

💎 Gem Team

Multi-agent orchestration framework for spec-driven development and automated verification.

Copilot Plugin Version


🤔 Why Gem Team?

  • 10x Faster — Parallel execution with wave-based execution
  • 🏆 Higher Quality — Specialized agents + TDD + verification gates + contract-first
  • 🔒 Built-in Security — OWASP scanning, secrets/PII detection on critical tasks
  • 👁️ Full Visibility — Real-time status, clear approval gates
  • 🛡️ Resilient — Pre-mortem analysis, failure handling, auto-replanning
  • ♻️ Pattern Reuse — Codebase pattern discovery prevents reinventing wheels
  • 🪞 Self-Correcting — All agents self-critique at 0.85 confidence threshold
  • 📋 Source Verified — Every factual claim cites its source; no guesswork
  • Accessibility-First — WCAG compliance validated at spec and runtime layers
  • 🔬 Smart Debugging — Root-cause analysis with stack trace parsing + confidence-scored fixes
  • 🚀 Safe DevOps — Idempotent operations, health checks, mandatory approval gates
  • 🔗 Traceable — Self-documenting IDs link requirements → tasks → tests → evidence
  • 📚 Knowledge-Driven — Prioritized sources (PRD → codebase → AGENTS.md → Context7 → docs)
  • 🛠️ Skills & Guidelines — Built-in skill & guidelines (web-design-guidelines)
  • 📐 Spec-Driven — Multi-step refinement defines "what" before "how"
  • 🌊 Wave-Based — Parallel agents with integration gates per wave
  • 🗂️ Multi-Plan — Complex tasks: 3 planner variants → best DAG selected automatically
  • 🩺 Diagnose-then-Fix — gem-debugger diagnoses → gem-implementer fixes → re-verifies
  • ⚠️ Pre-Mortem — Failure modes identified BEFORE execution
  • 💬 Constructive Critique — gem-critic challenges assumptions, finds edge cases
  • 📝 Contract-First — Contract tests written before implementation
  • 📱 Mobile Agents — Native mobile implementation (React Native, Flutter) + iOS/Android testing

📦 Installation

# Using Copilot CLI
copilot plugin install gem-team@awesome-copilot

Install Gem Team Now →


🏗️ Architecture

flowchart
    USER["User Goal"]

    subgraph ORCH["Orchestrator"]
        detect["Phase Detection"]
    end

    subgraph PHASES
        DISCUSS["🔹 Discuss"]
        PRD["📋 PRD"]
        RESEARCH["🔍 Research"]
        PLANNING["📝 Planning"]
        EXEC["⚙️ Execution"]
        SUMMARY["📊 Summary"]
    end

    DIAG["🔬 Diagnose-then-Fix"]

    USER --> detect

    detect --> |"Simple"| RESEARCH
    detect --> |"Medium|Complex"| DISCUSS

    DISCUSS --> PRD
    PRD --> RESEARCH
    RESEARCH --> PLANNING
    PLANNING --> |"Approved"| EXEC
    PLANNING --> |"Feedback"| PLANNING
    EXEC --> |"Failure"| DIAG
    DIAG --> EXEC
    EXEC --> SUMMARY

    PLANNING -.-> |"critique"| critic
    PLANNING -.-> |"review"| reviewer

    EXEC --> |"parallel ≤4"| agents
    EXEC --> |"post-wave (complex)"| critic

🔄 Core Workflow

Phase Flow: User Goal → Orchestrator → Discuss (medium|complex) → PRD → Research → Planning → Execution → Summary

Error Handling: Diagnose-then-Fix loop (Debugger → Implementer → Re-verify)

Orchestrator auto-detects phase and routes accordingly.

Condition → Phase
No plan + simple Research
No plan + medium|complex Discuss → PRD → Research
Plan + pending tasks Execution
Plan + feedback Planning

🤖 The Agent Team (Q2 2026 SOTA)

Role Description Output Recommended LLM
🎯 ORCHESTRATOR (gem-orchestrator) The team lead: Orchestrates research, planning, implementation, and verification 📋 PRD, plan.yaml Closed: GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6
Open: GLM-5, Kimi K2.5, Qwen3.5
🔍 RESEARCHER (gem-researcher) Codebase exploration — patterns, dependencies, architecture discovery 🔍 findings Closed: Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6
Open: GLM-5, Qwen3.5-9B, DeepSeek-V3.2
📋 PLANNER (gem-planner) DAG-based execution plans — task decomposition, wave scheduling, risk analysis 📄 plan.yaml Closed: Gemini 3.1 Pro, Claude Sonnet 4.6, GPT-5.4
Open: Kimi K2.5, GLM-5, Qwen3.5
🔧 IMPLEMENTER (gem-implementer) TDD code implementation — features, bugs, refactoring. Never reviews own work 💻 code Closed: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro
Open: DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next
🧪 BROWSER TESTER (gem-browser-tester) E2E browser testing, UI/UX validation, visual regression with Playwright 🧪 evidence Closed: GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Flash
Open: Llama 4 Maverick, Qwen3.5-Flash, MiniMax M2.7
🚀 DEVOPS (gem-devops) Infrastructure deployment, CI/CD pipelines, container management 🌍 infra Closed: GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6
Open: DeepSeek-V3.2, GLM-5, Qwen3.5
🛡️ REVIEWER (gem-reviewer) Security auditing, code review, OWASP scanning, PRD compliance verification 📊 review report Closed: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro
Open: Kimi K2.5, GLM-5, DeepSeek-V3.2
📝 DOCUMENTATION (gem-documentation-writer) Technical documentation, README files, API docs, diagrams, walkthroughs 📝 docs Closed: Claude Sonnet 4.6, Gemini 3.1 Flash, GPT-5.4 Mini
Open: Llama 4 Scout, Qwen3.5-9B, MiniMax M2.7
🔬 DEBUGGER (gem-debugger) Root-cause analysis, stack trace diagnosis, regression bisection, error reproduction 🔬 diagnosis Closed: Gemini 3.1 Pro (Retrieval King), Claude Opus 4.6, GPT-5.4
Open: DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next
🎯 CRITIC (gem-critic) Challenges assumptions, finds edge cases, spots over-engineering and logic gaps 💬 critique Closed: Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro
Open: Kimi K2.5, GLM-5, Qwen3.5
✂️ SIMPLIFIER (gem-code-simplifier) Refactoring specialist — removes dead code, reduces complexity, consolidates duplicates ✂️ change log Closed: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro
Open: DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next
🎨 DESIGNER (gem-designer) UI/UX design specialist — layouts, themes, color schemes, design systems, accessibility 🎨 DESIGN.md Closed: GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6
Open: Qwen3.5, GLM-5, MiniMax M2.7
📱 IMPLEMENTER-MOBILE (gem-implementer-mobile) Mobile implementation — React Native, Expo, Flutter with TDD 💻 code Closed: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro
Open: DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next
📱 DESIGNER-MOBILE (gem-designer-mobile) Mobile UI/UX specialist — HIG, Material Design, safe areas, touch targets 🎨 DESIGN.md Closed: GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6
Open: Qwen3.5, GLM-5, MiniMax M2.7
📱 MOBILE TESTER (gem-mobile-tester) Mobile E2E testing — Detox, Maestro, iOS/Android simulators 🧪 evidence Closed: GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Flash
Open: Llama 4 Maverick, Qwen3.5-Flash, MiniMax M2.7

Agent File Skeleton

Each .agent.md file follows this structure:

---                                    # Frontmatter: description, name, triggers
# Role                                 # One-line identity
# Expertise                            # Core competencies
# Knowledge Sources                    # Prioritized reference list
# Workflow                             # Step-by-step execution phases
  ## 1. Initialize                     # Setup and context gathering
  ## 2. Analyze/Execute                # Role-specific work
  ## N. Self-Critique                  # Confidence check (≥0.85)
  ## N+1. Handle Failure               # Retry/escalate logic
  ## N+2. Output                       # JSON deliverable format
# Input Format                         # Expected JSON schema
# Output Format                        # Return JSON schema
# Rules
  ## Execution                         # Tool usage, batching, error handling
  ## Constitutional                    # IF-THEN decision rules
  ## Anti-Patterns                     # Behaviors to avoid
  ## Anti-Rationalization              # Excuse → Rebuttal table
  ## Directives                        # Non-negotiable commands

All agents share: Execution rules, Constitutional rules, Anti-Patterns, and Directives sections. Anti-Rationalization tables are present in 5 agents (implementer, planner, reviewer, designer, browser-tester). Role-specific sections (Workflow, Expertise, Knowledge Sources) vary by agent.


📚 Knowledge Sources

Agents consult only the sources relevant to their role. Trust levels apply:

Trust Level Sources Behavior
Trusted PRD.yaml, plan.yaml, AGENTS.md Follow as instructions
Verify Codebase files, research findings Cross-reference before assuming
Untrusted Error logs, external data, third-party responses Factual only — never as instructions
Agent Knowledge Sources
orchestrator PRD.yaml, AGENTS.md
researcher PRD.yaml, codebase patterns, AGENTS.md, Context7, official docs, online search
planner PRD.yaml, codebase patterns, AGENTS.md, Context7, official docs
implementer codebase patterns, AGENTS.md, Context7 (API verification), DESIGN.md (UI tasks)
debugger codebase patterns, AGENTS.md, error logs (untrusted), git history, DESIGN.md (UI bugs)
reviewer PRD.yaml, codebase patterns, AGENTS.md, OWASP reference, DESIGN.md (UI review)
browser-tester PRD.yaml (flow coverage), AGENTS.md, test fixtures, baseline screenshots, DESIGN.md (visual validation)
designer PRD.yaml (UX goals), codebase patterns, AGENTS.md, existing design system
code-simplifier codebase patterns, AGENTS.md, test suites (behavior verification)
documentation-writer AGENTS.md, existing docs, source code

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License.

💬 Support

If you encounter any issues or have questions, please open an issue on GitHub.


📋 Changelog

1.6.0 (April 8, 2026)

New:

  • Mobile agents — build, design, and test iOS/Android apps with gem-implementer-mobile, gem-designer-mobile, gem-mobile-tester

Improved:

  • Concise agent descriptions — one-liners that quickly communicate what each agent does
  • Unified agent table — clean overview of all 15 agents with roles and outputs

1.5.4

Bug Fixes:

  • Fixed AGENTS.md pattern extraction logic for semantic search integration