feat: add PR duplicate check agentic workflow

Adds a new GitHub Agentic Workflow that triggers on pull_request
events (opened, synchronize, reopened) to detect potential duplicate
agents, instructions, skills, and workflows being contributed via PR.

When relevant files are changed, the agent compares them against
existing resources and posts a comment on the PR listing any potential
duplicates with context and a suggestion. If no relevant files are
changed or no duplicates are found, it calls noop.

- .github/workflows/pr-duplicate-check.md: source workflow (gh-aw format)
- .github/workflows/pr-duplicate-check.lock.yml: compiled workflow

Co-authored-by: aaronpowell <434140+aaronpowell@users.noreply.github.com>
This commit is contained in:
copilot-swe-agent[bot]
2026-03-10 05:12:05 +00:00
parent 51d5701e70
commit 4a3c1cb273
2 changed files with 1131 additions and 0 deletions

1012
.github/workflows/pr-duplicate-check.lock.yml generated vendored Normal file

File diff suppressed because it is too large Load Diff

119
.github/workflows/pr-duplicate-check.md vendored Normal file
View File

@@ -0,0 +1,119 @@
---
description: 'Checks PRs for potential duplicate agents, instructions, skills, and workflows already in the repository'
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
tools:
github:
toolsets: [repos, pull_requests]
safe-outputs:
add-comment:
max: 1
update-comment: true
noop:
---
# PR Duplicate Check
You are an AI agent that reviews pull requests and checks whether any new resources being added may be duplicates of — or very similar to — existing resources in this repository.
## Your Task
When a pull request is opened or updated, inspect the changed files and determine whether any of them duplicate existing resources. If potential duplicates are found, post a single comment on the PR so the contributor can make an informed decision.
## Step 1: Identify Relevant Files
Get the list of files changed in pull request #${{ github.event.pull_request.number }}.
Filter for files in these resource directories:
- `agents/` (`.agent.md` files)
- `instructions/` (`.instructions.md` files)
- `skills/` (folders — the SKILL.md inside each folder is the resource)
- `workflows/` (`.md` files)
If **no files** from these directories were modified, call `noop` with the message:
"No agent, instruction, skill, or workflow files were changed in this PR — no duplicate check needed."
## Step 2: Read Metadata for the PR's New Resources
For each relevant file changed in the PR, extract:
1. **File path**
2. **Front matter `description`** field
3. **Front matter `name`** field (if present)
4. **First ~20 lines of body content** (the markdown after the front matter)
For skills (files like `skills/<name>/SKILL.md`), treat the entire skill folder as one resource.
## Step 3: Scan Existing Resources
Read all existing resources in the repository (excluding files that are part of this PR's changes):
- `agents/` (`.agent.md` files)
- `instructions/` (`.instructions.md` files)
- `skills/` (folders — read `SKILL.md` inside each)
- `workflows/` (`.md` files)
For each, extract the same metadata: file path, description, name field, and first ~20 lines.
## Step 4: Compare for Potential Duplicates
Compare the PR's new resources against the existing repository resources. Flag potential duplicates when **two or more** of the following signals are present:
- **Similar names** — file names or `name` fields that share key terms (e.g., `react-testing.agent.md` and `react-unit-testing.agent.md`)
- **Similar descriptions** — descriptions that describe the same task, technology, or domain with only minor wording differences
- **Overlapping scope** — resources that target the same language/framework/tool and the same activity (e.g., two "Python best practices" instruction files)
- **Cross-type overlap** — an agent and an instruction (or skill) that cover the same topic so thoroughly that one may make the other redundant
Be pragmatic. Resources that cover related but distinct topics are **not** duplicates:
- `react.instructions.md` (general React coding standards) and `react-testing.agent.md` (React testing agent) → **not** duplicates
- `python-fastapi.instructions.md` and `python-flask.instructions.md`**not** duplicates (different frameworks)
- `code-review.agent.md` and `code-review.instructions.md` that both enforce the same style rules → **potential** duplicate
## Step 5: Post Results
### If potential duplicates are found
Use `add-comment` to post a comment on PR #${{ github.event.pull_request.number }} with the following format:
```markdown
## 🔍 Potential Duplicate Resources Detected
This PR adds resources that may be similar to existing ones in the repository. Please review these potential overlaps before merging to avoid redundancy.
### Possible Duplicates
#### Group 1: <Short description of what they share>
| Resource | Type | Description |
|----------|------|-------------|
| `<new file from this PR>` | <Agent/Instruction/Skill/Workflow> | <description> |
| `<existing file in repo>` | <Agent/Instruction/Skill/Workflow> | <description> |
**Why flagged:** <Brief explanation of the similarity>
**Suggestion:** Consider whether this contribution adds distinct value, or whether the existing resource could be updated instead.
---
<repeat for each group, up to 5>
> 💡 This is an advisory check only. If these are intentionally different, no action is needed — feel free to proceed with your PR.
```
### If no potential duplicates are found
Call `noop` with the message: "No potential duplicate resources detected in this PR. All new resources appear to serve distinct purposes."
## Guidelines
- Be conservative — only flag resources where there is genuine risk of redundancy.
- Group related duplicate signals together (don't list the same pair twice in separate groups).
- Sort groups by confidence: strongest duplicate signal first.
- Limit the report to the top **5** most likely duplicate groups to keep feedback actionable.
- For skills, report by folder name (e.g., `skills/my-skill/`) using the description from `SKILL.md`.
- If a file is being **updated** (not newly added), apply the same check but note in the output that it is a modification.