Add duplicate resource detector agentic workflow

Weekly scheduled workflow that scans agents, prompts, instructions, and
skills for potential duplicates based on name, description, and content
similarity. Reports findings as a GitHub issue with task list checkboxes
for review. Checks previous duplicate-review issues to exclude known
accepted pairs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Aaron Powell
2026-02-19 15:43:54 +11:00
parent 8ac0e41cb0
commit 3436da9619
2 changed files with 1172 additions and 0 deletions

1045
.github/workflows/duplicate-resource-detector.lock.yml generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,127 @@
---
description: Weekly scan of agents, prompts, instructions, and skills to identify potential duplicate resources and report them for review
on:
schedule: weekly
permissions:
contents: read
issues: read
tools:
github:
toolsets: [repos, issues]
safe-outputs:
create-issue:
max: 1
close-older-issues: true
labels:
- duplicate-review
noop:
---
# Duplicate Resource Detector
You are an AI agent that audits the resources in this repository to find potential duplicates — resources that appear to serve the same or very similar purpose.
## Your Task
Scan all resources in the following directories and identify groups of resources that may be duplicates or near-duplicates based on their **name**, **description**, and **content**:
- `agents/` (`.agent.md` files)
- `prompts/` (`.prompt.md` files)
- `instructions/` (`.instructions.md` files)
- `skills/` (folders — check `SKILL.md` inside each)
### Step 1: Gather Resource Metadata
For each resource, extract:
1. **File name** (the path)
2. **Front matter `description`** field
3. **Front matter `name`** field (if present)
4. **First ~20 lines of body content** (the markdown after the front matter)
Use bash to read files efficiently. For skills, read `skills/<name>/SKILL.md`.
### Step 2: Identify Potential Duplicates
Compare resources and flag groups that look like potential duplicates. Consider resources as potential duplicates when they share **two or more** of the following signals:
- **Similar names** — file names or `name` fields that share key terms (e.g., `react-testing.prompt.md` and `react-unit-testing.prompt.md`)
- **Similar descriptions** — descriptions that describe the same task, technology, or domain with only minor wording differences
- **Overlapping scope** — resources that target the same language/framework/tool and the same activity (e.g., two separate "Python best practices" instructions)
- **Cross-type overlap** — an agent and a prompt (or instruction and agent) that cover the same topic so thoroughly that one may make the other redundant
Be pragmatic. Resources that cover related but distinct topics are NOT duplicates. For example:
- `react.instructions.md` (general React coding standards) and `react-testing.prompt.md` (React testing prompts) are **not** duplicates — they serve different purposes.
- `python-fastapi.instructions.md` and `python-flask.instructions.md` are **not** duplicates — they target different frameworks.
- `code-review.agent.md` and `code-review.prompt.md` that both do the same style of code review **are** potential duplicates worth flagging.
### Step 3: Check for Known Accepted Duplicates
Before finalizing the report, search for **previous issues** labeled `duplicate-review` in this repository:
```
Search for issues with label "duplicate-review" that are closed
```
Read the comments and body of those past issues to find any pairs or groups that reviewers have explicitly marked as **"accepted"** or **"not duplicates"**. Look for phrases like:
- "accepted as-is"
- "not duplicates"
- "intentionally separate"
- "keep both"
- checked task list items (i.e., `- [x]`)
Exclude those known-accepted pairs from the current report. If you include a group that was previously reviewed, add a note: `(previously reviewed — see #<issue-number>)`.
### Step 4: Produce the Report
Create an issue titled: `🔍 Duplicate Resource Review`
Format the body as follows:
```markdown
### Summary
- **Potential duplicate groups found:** N
- **Resources involved:** M
- **Known accepted (excluded):** K pairs from previous reviews
### How to Use This Report
Review each group below. If the resources are intentionally separate, check the box to mark them as accepted. These will be excluded from future reports.
### Potential Duplicates
#### Group 1: <Short description of what they share>
- [ ] Reviewed — these are intentionally separate
| Resource | Type | Description |
|----------|------|-------------|
| `agents/foo.agent.md` | Agent | Does X for Y |
| `prompts/foo.prompt.md` | Prompt | Also does X for Y |
**Why flagged:** <Brief explanation of the similarity>
---
#### Group 2: ...
<repeat for each group>
```
Use `<details>` blocks to collapse groups if there are more than 10.
### Safe Output Guidance
- If you find potential duplicates: use `create-issue` to file the report.
- If **no** potential duplicates are found (after excluding known accepted ones): call `noop` with the message: "No potential duplicate resources detected. All resources appear to serve distinct purposes."
## Guidelines
- Be conservative — only flag resources where there is a genuine risk of redundancy.
- Group related duplicates together (don't list the same pair twice in separate groups).
- Sort groups by confidence (strongest duplicate signals first).
- Include cross-type duplicates (e.g., an agent and a prompt doing the same thing).
- Limit the report to the top 20 most likely duplicate groups to keep it actionable.
- For skills, use the folder name and description from `SKILL.md`.
- Process resources in batches to stay within time limits — prioritize name and description comparison, then spot-check content for top candidates.