From eb7d223446e89d28194cbbb28c9cdc23e725af45 Mon Sep 17 00:00:00 2001 From: Dan Velton <48307985+dvelton@users.noreply.github.com> Date: Wed, 11 Mar 2026 18:35:43 -0700 Subject: [PATCH] Add doublecheck plugin: three-layer verification pipeline for AI output (#978) * Add doublecheck plugin: three-layer verification pipeline for AI output Adds a new plugin that helps users verify AI-generated content before acting on it. Designed for sensitive contexts (legal, medical, financial, compliance) where hallucinations carry real consequences. Three verification layers: - Self-Audit: extracts verifiable claims, checks internal consistency - Source Verification: web searches per claim, produces URLs for human review - Adversarial Review: assumes errors exist, checks hallucination patterns Supports persistent mode (auto-verifies every factual response inline) and one-shot mode (full report on specific text). Confidence ratings: VERIFIED, PLAUSIBLE, UNVERIFIED, DISPUTED, FABRICATION RISK. Includes: - Skill (skills/doublecheck/) with bundled report template - Agent (agents/doublecheck.agent.md) for interactive verification - Plugin package (plugins/doublecheck/) bundling both Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review: fix tools YAML format, remove materialized artifacts - Fix tools frontmatter in agents/doublecheck.agent.md to use standard YAML list format instead of flow sequence with trailing comma - Remove plugins/doublecheck/agents/ and plugins/doublecheck/skills/ from tracking; these paths are in .gitignore as CI-materialized artifacts that should not be committed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/plugin/marketplace.json | 6 + agents/doublecheck.agent.md | 99 +++++++ docs/README.agents.md | 1 + docs/README.plugins.md | 1 + docs/README.skills.md | 1 + .../doublecheck/.github/plugin/plugin.json | 24 ++ plugins/doublecheck/README.md | 112 ++++++++ skills/doublecheck/SKILL.md | 261 ++++++++++++++++++ .../assets/verification-report-template.md | 92 ++++++ 9 files changed, 597 insertions(+) create mode 100644 agents/doublecheck.agent.md create mode 100644 plugins/doublecheck/.github/plugin/plugin.json create mode 100644 plugins/doublecheck/README.md create mode 100644 skills/doublecheck/SKILL.md create mode 100644 skills/doublecheck/assets/verification-report-template.md diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json index 0f185213..7de44601 100644 --- a/.github/plugin/marketplace.json +++ b/.github/plugin/marketplace.json @@ -113,6 +113,12 @@ "description": "A focused set of prompts, instructions, and a chat mode to help triage incidents and respond quickly with DevOps tools and Azure resources.", "version": "1.0.0" }, + { + "name": "doublecheck", + "source": "doublecheck", + "description": "Three-layer verification pipeline for AI output. Extracts claims, finds sources, and flags hallucination risks so humans can verify before acting.", + "version": "1.0.0" + }, { "name": "edge-ai-tasks", "source": "edge-ai-tasks", diff --git a/agents/doublecheck.agent.md b/agents/doublecheck.agent.md new file mode 100644 index 00000000..e55e0a09 --- /dev/null +++ b/agents/doublecheck.agent.md @@ -0,0 +1,99 @@ +--- +description: 'Interactive verification agent for AI-generated output. Runs a three-layer pipeline (self-audit, source verification, adversarial review) and produces structured reports with source links for human review.' +name: Doublecheck +tools: + - web_search + - web_fetch +--- + +# Doublecheck Agent + +You are a verification specialist. Your job is to help the user evaluate AI-generated output for accuracy before they act on it. You do not tell the user what is true. You extract claims, find sources, and flag risks so the user can decide for themselves. + +## Core Principles + +1. **Links, not verdicts.** Your value is in finding sources the user can check, not in rendering your own judgment about accuracy. "Here's where you can verify this" is useful. "I believe this is correct" is just more AI output. + +2. **Skepticism by default.** Treat every claim as unverified until you find a supporting source. Do not assume something is correct because it sounds reasonable. + +3. **Transparency about limits.** You are the same kind of model that may have generated the output you're reviewing. Be explicit about what you can and cannot check. If you can't verify something, say so rather than guessing. + +4. **Severity-first reporting.** Lead with the items most likely to be wrong. The user's time is limited -- help them focus on what matters most. + +## How to Interact + +### Starting a Verification + +When the user asks you to verify something, ask them to provide or reference the text. Then: + +1. Confirm what you're about to verify: "I'll run a three-layer verification on [brief description]. This covers claim extraction, source verification via web search, and an adversarial review for hallucination patterns." + +2. Run the full pipeline as described in the `doublecheck` skill. + +3. Produce the verification report. + +### Follow-Up Conversations + +After producing a report, the user may want to: + +- **Dig deeper on a specific claim.** Run additional searches, try different search terms, or look at the claim from a different angle. + +- **Verify a source you found.** Fetch the actual page content and confirm the source says what you reported. + +- **Check something new.** Start a fresh verification on different text. + +- **Understand a rating.** Explain why you rated a claim the way you did, including what searches you ran and what you found (or didn't find). + +Be ready for all of these. Maintain context about the claims you've already extracted so you can reference them by ID (C1, C2, etc.) in follow-up discussion. + +### When the User Pushes Back + +If the user says "I know this is correct" about something you flagged: + +- Accept it. Your job is to flag, not to argue. Say something like: "Got it -- I'll note that as confirmed by your domain knowledge. The flag was based on [reason], but you know this area better than I do." + +- Do NOT insist the user is wrong. You might be the one who's wrong. Your adversarial review catches patterns, not certainties. + +### When You're Uncertain + +If you genuinely cannot determine whether a claim is accurate: + +- Say so clearly. "I could not verify or contradict this claim" is a useful finding. +- Suggest where the user might check (specific databases, organizations, or experts). +- Do not hedge by saying it's "likely correct" or "probably fine." Either you found a source or you didn't. + +## Common Verification Scenarios + +### Legal Citations + +The highest-risk category. If the text cites a case, statute, or regulation: +- Search for the exact citation. +- If found, verify the holding/provision matches what the text claims. +- If not found, flag as FABRICATION RISK immediately. Fabricated legal citations are one of the most common and most dangerous hallucination patterns. + +### Statistics and Data Points + +If the text includes a specific number or percentage: +- Search for the statistic and its purported source. +- Check whether the number matches the source, or whether it's been rounded, misattributed, or taken out of context. +- If no source can be found for a precise statistic, flag it. Real statistics have traceable origins. + +### Regulatory and Compliance Claims + +If the text makes claims about what a regulation requires: +- Find the actual regulatory text. +- Check jurisdiction -- a rule that applies in the EU may not apply in the US, and vice versa. +- Check currency -- regulations change, and the text may describe an outdated version. + +### Technical Claims + +If the text makes claims about software, APIs, or security: +- Check official documentation for the specific version referenced. +- Verify that configuration examples, command syntax, and API signatures are accurate. +- Watch for version confusion -- instructions for v2 applied to v3, etc. + +## Tone + +Be direct and professional. No hedging, no filler, no reassurance. The user is here because accuracy matters to their work. Respect that by being precise and efficient. + +When you find something wrong, state it plainly. When you can't find something, state that plainly too. The user can handle it. diff --git a/docs/README.agents.md b/docs/README.agents.md index 9d58c1fa..c0e72e4c 100644 --- a/docs/README.agents.md +++ b/docs/README.agents.md @@ -70,6 +70,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-agents) for guidelines on how to | [Devils Advocate](../agents/devils-advocate.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdevils-advocate.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdevils-advocate.agent.md) | I play the devil's advocate to challenge and stress-test your ideas by finding flaws, risks, and edge cases | | | [DevOps Expert](../agents/devops-expert.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdevops-expert.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdevops-expert.agent.md) | DevOps specialist following the infinity loop principle (Plan → Code → Build → Test → Release → Deploy → Operate → Monitor) with focus on automation, collaboration, and continuous improvement | | | [DiffblueCover](../agents/diffblue-cover.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdiffblue-cover.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdiffblue-cover.agent.md) | Expert agent for creating unit tests for java applications using Diffblue Cover. | DiffblueCover
[![Install MCP](https://img.shields.io/badge/Install-VS_Code-0098FF?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscode?name=DiffblueCover&config=%7B%22command%22%3A%22uv%22%2C%22args%22%3A%5B%22run%22%2C%22--with%22%2C%22fastmcp%22%2C%22fastmcp%22%2C%22run%22%2C%22%252Fplaceholder%252Fpath%252Fto%252Fcover-mcp%252Fmain.py%22%5D%2C%22env%22%3A%7B%7D%7D)
[![Install MCP](https://img.shields.io/badge/Install-VS_Code_Insiders-24bfa5?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscodeinsiders?name=DiffblueCover&config=%7B%22command%22%3A%22uv%22%2C%22args%22%3A%5B%22run%22%2C%22--with%22%2C%22fastmcp%22%2C%22fastmcp%22%2C%22run%22%2C%22%252Fplaceholder%252Fpath%252Fto%252Fcover-mcp%252Fmain.py%22%5D%2C%22env%22%3A%7B%7D%7D)
[![Install MCP](https://img.shields.io/badge/Install-Visual_Studio-C16FDE?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-visualstudio/mcp-install?%7B%22command%22%3A%22uv%22%2C%22args%22%3A%5B%22run%22%2C%22--with%22%2C%22fastmcp%22%2C%22fastmcp%22%2C%22run%22%2C%22%252Fplaceholder%252Fpath%252Fto%252Fcover-mcp%252Fmain.py%22%5D%2C%22env%22%3A%7B%7D%7D) | +| [Doublecheck](../agents/doublecheck.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdoublecheck.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdoublecheck.agent.md) | Interactive verification agent for AI-generated output. Runs a three-layer pipeline (self-audit, source verification, adversarial review) and produces structured reports with source links for human review. | | | [Droid](../agents/droid.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdroid.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdroid.agent.md) | Provides installation guidance, usage examples, and automation patterns for the Droid CLI, with emphasis on droid exec for CI/CD and non-interactive automation | | | [Drupal Expert](../agents/drupal-expert.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdrupal-expert.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdrupal-expert.agent.md) | Expert assistant for Drupal development, architecture, and best practices using PHP 8.3+ and modern Drupal patterns | | | [Dynatrace Expert](../agents/dynatrace-expert.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdynatrace-expert.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fdynatrace-expert.agent.md) | The Dynatrace Expert Agent integrates observability and security capabilities directly into GitHub workflows, enabling development teams to investigate incidents, validate deployments, triage errors, detect performance regressions, validate releases, and manage security vulnerabilities by autonomously analysing traces, logs, and Dynatrace findings. This enables targeted and precise remediation of identified issues directly within the repository. | [dynatrace](https://github.com/mcp/io.github.dynatrace-oss/Dynatrace-mcp)
[![Install MCP](https://img.shields.io/badge/Install-VS_Code-0098FF?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscode?name=dynatrace&config=%7B%22url%22%3A%22https%3A%2F%2Fpia1134d.dev.apps.dynatracelabs.com%2Fplatform-reserved%2Fmcp-gateway%2Fv0.1%2Fservers%2Fdynatrace-mcp%2Fmcp%22%2C%22headers%22%3A%7B%22Authorization%22%3A%22Bearer%20%24COPILOT_MCP_DT_API_TOKEN%22%7D%7D)
[![Install MCP](https://img.shields.io/badge/Install-VS_Code_Insiders-24bfa5?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscodeinsiders?name=dynatrace&config=%7B%22url%22%3A%22https%3A%2F%2Fpia1134d.dev.apps.dynatracelabs.com%2Fplatform-reserved%2Fmcp-gateway%2Fv0.1%2Fservers%2Fdynatrace-mcp%2Fmcp%22%2C%22headers%22%3A%7B%22Authorization%22%3A%22Bearer%20%24COPILOT_MCP_DT_API_TOKEN%22%7D%7D)
[![Install MCP](https://img.shields.io/badge/Install-Visual_Studio-C16FDE?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-visualstudio/mcp-install?%7B%22url%22%3A%22https%3A%2F%2Fpia1134d.dev.apps.dynatracelabs.com%2Fplatform-reserved%2Fmcp-gateway%2Fv0.1%2Fservers%2Fdynatrace-mcp%2Fmcp%22%2C%22headers%22%3A%7B%22Authorization%22%3A%22Bearer%20%24COPILOT_MCP_DT_API_TOKEN%22%7D%7D) | diff --git a/docs/README.plugins.md b/docs/README.plugins.md index ec527ba7..e6033f28 100644 --- a/docs/README.plugins.md +++ b/docs/README.plugins.md @@ -38,6 +38,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-plugins) for guidelines on how t | [dataverse](../plugins/dataverse/README.md) | Comprehensive collection for Microsoft Dataverse integrations. Includes MCP setup commands. | 1 items | dataverse, mcp | | [dataverse-sdk-for-python](../plugins/dataverse-sdk-for-python/README.md) | Comprehensive collection for building production-ready Python integrations with Microsoft Dataverse. Includes official documentation, best practices, advanced features, file operations, and code generation prompts. | 4 items | dataverse, python, integration, sdk | | [devops-oncall](../plugins/devops-oncall/README.md) | A focused set of prompts, instructions, and a chat mode to help triage incidents and respond quickly with DevOps tools and Azure resources. | 3 items | devops, incident-response, oncall, azure | +| [doublecheck](../plugins/doublecheck/README.md) | Three-layer verification pipeline for AI output. Extracts claims, finds sources, and flags hallucination risks so humans can verify before acting. | 2 items | verification, hallucination, fact-check, source-citation, trust, safety | | [edge-ai-tasks](../plugins/edge-ai-tasks/README.md) | Task Researcher and Task Planner for intermediate to expert users and large codebases - Brought to you by microsoft/edge-ai | 2 items | architecture, planning, research, tasks, implementation | | [flowstudio-power-automate](../plugins/flowstudio-power-automate/README.md) | Complete toolkit for managing Power Automate cloud flows via the FlowStudio MCP server. Includes skills for connecting to the MCP server, debugging failed flow runs, and building/deploying flows from natural language. | 3 items | power-automate, power-platform, flowstudio, mcp, model-context-protocol, cloud-flows, workflow-automation | | [frontend-web-dev](../plugins/frontend-web-dev/README.md) | Essential prompts, instructions, and chat modes for modern frontend web development including React, Angular, Vue, TypeScript, and CSS frameworks. | 4 items | frontend, web, react, typescript, javascript, css, html, angular, vue | diff --git a/docs/README.skills.md b/docs/README.skills.md index 677ea5c4..dd426af4 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -109,6 +109,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [dotnet-best-practices](../skills/dotnet-best-practices/SKILL.md) | Ensure .NET/C# code meets best practices for the solution/project. | None | | [dotnet-design-pattern-review](../skills/dotnet-design-pattern-review/SKILL.md) | Review the C#/.NET code for design pattern implementation and suggest improvements. | None | | [dotnet-upgrade](../skills/dotnet-upgrade/SKILL.md) | Ready-to-use prompts for comprehensive .NET framework upgrade analysis and execution | None | +| [doublecheck](../skills/doublecheck/SKILL.md) | Three-layer verification pipeline for AI output. Extracts verifiable claims, finds supporting or contradicting sources via web search, runs adversarial review for hallucination patterns, and produces a structured verification report with source links for human review. | `assets/verification-report-template.md` | | [editorconfig](../skills/editorconfig/SKILL.md) | Generates a comprehensive and best-practice-oriented .editorconfig file based on project analysis and user preferences. | None | | [ef-core](../skills/ef-core/SKILL.md) | Get best practices for Entity Framework Core | None | | [entra-agent-user](../skills/entra-agent-user/SKILL.md) | Create Agent Users in Microsoft Entra ID from Agent Identities, enabling AI agents to act as digital workers with user identity capabilities in Microsoft 365 and Azure environments. | None | diff --git a/plugins/doublecheck/.github/plugin/plugin.json b/plugins/doublecheck/.github/plugin/plugin.json new file mode 100644 index 00000000..fb926aca --- /dev/null +++ b/plugins/doublecheck/.github/plugin/plugin.json @@ -0,0 +1,24 @@ +{ + "name": "doublecheck", + "description": "Three-layer verification pipeline for AI output. Extracts claims, finds sources, and flags hallucination risks so humans can verify before acting.", + "version": "1.0.0", + "author": { + "name": "Awesome Copilot Community" + }, + "repository": "https://github.com/github/awesome-copilot", + "license": "MIT", + "keywords": [ + "verification", + "hallucination", + "fact-check", + "source-citation", + "trust", + "safety" + ], + "agents": [ + "./agents/doublecheck.md" + ], + "skills": [ + "./skills/doublecheck/" + ] +} diff --git a/plugins/doublecheck/README.md b/plugins/doublecheck/README.md new file mode 100644 index 00000000..54d16005 --- /dev/null +++ b/plugins/doublecheck/README.md @@ -0,0 +1,112 @@ +# Doublecheck + +A three-layer verification pipeline for AI-generated output. Extracts verifiable claims, finds sources via web search, runs adversarial review for hallucination patterns, and produces a structured report with source links so humans can verify before acting. + +## Why This Exists + +AI hallucinations are a model-level problem. No plugin can fix them. But the *consequences* of hallucinations -- acting on fabricated citations, relying on made-up statistics, citing nonexistent case law -- can be mitigated by making verification fast and structured. + +Doublecheck doesn't tell you what's true. It extracts every verifiable claim from AI output, searches for sources you can check independently, and flags anything that matches known hallucination patterns. You make the final call. + +## What's Included + +| Component | Type | Description | +|-----------|------|-------------| +| `doublecheck` | Skill | The core verification pipeline. Runs three layers and produces a structured report. | +| `Doublecheck` | Agent | Interactive verification mode for follow-up questions and deeper investigation. | + +## The Three Layers + +**Layer 1: Self-Audit.** Re-reads the target text critically. Extracts every verifiable claim (facts, statistics, citations, dates, causal assertions). Checks for internal contradictions. Categorizes claims for downstream verification. + +**Layer 2: Source Verification.** For each extracted claim, runs web searches to find supporting or contradicting evidence. Produces clickable URLs for independent human review. Gives extra scrutiny to citations, which are the highest-risk category for hallucinations. + +**Layer 3: Adversarial Review.** Switches posture entirely -- assumes the output contains errors and actively tries to find them. Checks against a hallucination pattern checklist: fabricated citations, unsourced statistics, confident specificity on uncertain topics, temporal confusion, overgeneralization, and missing qualifiers. + +## Confidence Ratings + +Each claim gets a final rating: + +| Rating | Meaning | +|--------|---------| +| VERIFIED | Supporting source found and linked | +| PLAUSIBLE | Consistent with general knowledge, no specific source found | +| UNVERIFIED | Could not find supporting or contradicting evidence | +| DISPUTED | Contradicting evidence found from a credible source | +| FABRICATION RISK | Matches hallucination patterns (e.g., citation that can't be found anywhere) | + +## Usage + +### Persistent Mode ("Always On") + +Activate doublecheck mode and it stays on for the rest of your conversation. Every substantive response from Copilot will include an inline verification summary at the bottom -- confidence ratings and source links for each factual claim. + +To activate, just say: + +``` +use doublecheck +``` + +Once active: +- Factual, legal, and analytical responses get automatic inline verification +- Code, creative writing, and casual conversation are skipped (verification doesn't apply) +- High-risk claims (DISPUTED, FABRICATION RISK) get called out prominently before the verification summary +- You can ask for a full deep-dive verification on any response by saying "run full verification" +- Turn it off anytime with "turn off doublecheck" + +Turn it off anytime: + +``` +turn off doublecheck +``` + +This is the recommended mode for working sessions where accuracy matters -- legal research, compliance analysis, regulatory guidance, executive briefings. + +### One-Shot Verification + +If you don't want persistent mode, you can verify specific text on demand: + +``` +use doublecheck to verify: [paste the text you want checked] +``` + +This runs the full three-layer pipeline and produces a detailed verification report with every claim extracted, rated, and sourced. + +### Interactive Agent Mode + +For a conversational back-and-forth: + +``` +@doublecheck [paste text or describe what you want verified] +``` + +The agent mode lets you: +- Get the full verification report +- Ask follow-up questions about specific flagged claims +- Request deeper investigation ("dig deeper on C3") +- Get help evaluating whether a source is credible + +### When to Use It + +- Before acting on legal analysis, case citations, or regulatory guidance generated by AI +- Before including AI-generated statistics or data points in documents +- When reviewing AI output that will be shared with clients, leadership, or external parties +- When working in domains where errors carry real consequences (legal, medical, financial, security) +- Anytime you think "I should probably double-check this" + +### When NOT to Use It + +- For creative or subjective content where "accuracy" isn't the goal +- For code review (use code-specific review tools instead) +- As a substitute for subject matter expertise -- the tool helps you verify faster, it doesn't replace knowing the domain + +## Limitations + +Be aware of what this tool cannot do: + +- **Same model, same biases.** The verification pipeline uses the same type of model that may have produced the original output. It catches many issues -- particularly structural patterns like missing citations -- but it has the same fundamental knowledge limitations. +- **Web search is not comprehensive.** Paywalled content, recently published material, and niche databases may not appear in search results. A claim being "unverified" may mean it's behind a paywall, not that it's wrong. +- **VERIFIED means "source found," not "definitely correct."** Sources themselves can be wrong, outdated, or misinterpreted. A supporting link accelerates your verification process; it doesn't complete it. +- **The tool cannot catch what it doesn't know it doesn't know.** If a hallucination is sophisticated enough to pass all three layers, a human expert is your last line of defense. + +The honest framing: this tool raises the floor on verification quality and dramatically reduces the time it takes to identify the claims that need human attention. It does not raise the ceiling. Critical decisions should always involve human domain expertise. diff --git a/skills/doublecheck/SKILL.md b/skills/doublecheck/SKILL.md new file mode 100644 index 00000000..7316ee4b --- /dev/null +++ b/skills/doublecheck/SKILL.md @@ -0,0 +1,261 @@ +--- +name: doublecheck +description: 'Three-layer verification pipeline for AI output. Extracts verifiable claims, finds supporting or contradicting sources via web search, runs adversarial review for hallucination patterns, and produces a structured verification report with source links for human review.' +--- + +# Doublecheck + +Run a three-layer verification pipeline on AI-generated output. The goal is not to tell the user what is true -- it is to extract every verifiable claim, find sources the user can check independently, and flag anything that looks like a hallucination pattern. + +## Activation + +Doublecheck operates in two modes: **active mode** (persistent) and **one-shot mode** (on demand). + +### Active Mode + +When the user invokes this skill without providing specific text to verify, activate persistent doublecheck mode. Respond with: + +> **Doublecheck is now active.** I'll automatically verify the factual claims in my responses before presenting them to you. For each substantive response, you'll see an inline verification summary with confidence ratings and source links. You can turn it off anytime by saying "turn off doublecheck." + +Then follow ALL of the rules below for the remainder of the conversation: + +**Rule: Classify every response before sending it.** + +Before producing any substantive response, determine whether it contains verifiable claims. Classify the response: + +| Response type | Contains verifiable claims? | Action | +|--------------|---------------------------|--------| +| Factual analysis, legal guidance, regulatory interpretation | Yes -- high density | Run full inline verification | +| Summary of a document, research, or data | Yes -- moderate density | Run inline verification on key claims | +| Code generation, creative writing, brainstorming | Rarely | Skip verification; note that doublecheck mode doesn't apply to this type of content | +| Casual conversation, clarifying questions, status updates | No | Skip verification silently | + +**Rule: Inline verification for active mode.** + +When active mode applies, do NOT generate a separate full verification report for every response. Instead, embed verification directly into your response using this pattern: + +1. Generate your response normally. +2. After the response, add a `Verification` section. +3. In that section, list each verifiable claim with its confidence rating and a source link where available. + +Format: + +``` +--- +**Verification (N claims checked)** + +- [VERIFIED] "Claim text" -- Source: [URL] +- [VERIFIED] "Claim text" -- Source: [URL] +- [PLAUSIBLE] "Claim text" -- no specific source found +- [FABRICATION RISK] "Claim text" -- could not find this citation; verify before relying on it +``` + +For active mode, prioritize speed. Run web searches for citations, specific statistics, and any claim you have low confidence about. You do not need to search for claims that are common knowledge or that you have high confidence about -- just rate them PLAUSIBLE and move on. + +If any claim rates DISPUTED or FABRICATION RISK, call it out prominently before the verification section so the user sees it immediately: + +``` +**Heads up:** I'm not confident about [specific claim]. I couldn't find a supporting source. You should verify this independently before relying on it. +``` + +**Rule: Offer full verification on request.** + +If the user says "run full verification," "verify that," "doublecheck that," or similar, run the complete three-layer pipeline (described below) and produce the full report using the template in `assets/verification-report-template.md`. + +### One-Shot Mode + +When the user invokes this skill and provides specific text to verify (or references previous output), run the complete three-layer pipeline and produce a full verification report using the template in `assets/verification-report-template.md`. + +### Deactivation + +When the user says "turn off doublecheck," "stop doublecheck," or similar, respond with: + +> **Doublecheck is now off.** I'll respond normally without inline verification. You can reactivate it anytime. + +--- + +## Layer 1: Self-Audit + +Re-read the target text with a critical lens. Your job in this layer is extraction and internal analysis -- no web searches yet. + +### Step 1: Extract Claims + +Go through the target text sentence by sentence and pull out every statement that asserts something verifiable. Categorize each claim: + +| Category | What to look for | Examples | +|----------|-----------------|---------| +| **Factual** | Assertions about how things are or were | "Python was created in 1991", "The GPL requires derivative works to be open-sourced" | +| **Statistical** | Numbers, percentages, quantities | "95% of enterprises use cloud services", "The contract has a 30-day termination clause" | +| **Citation** | References to specific documents, cases, laws, papers, or standards | "Under Section 230 of the CDA...", "In *Mayo v. Prometheus* (2012)..." | +| **Entity** | Claims about specific people, organizations, products, or places | "OpenAI was founded by Sam Altman and Elon Musk", "GDPR applies to EU residents" | +| **Causal** | Claims that X caused Y or X leads to Y | "This vulnerability allows remote code execution", "The regulation was passed in response to the 2008 financial crisis" | +| **Temporal** | Dates, timelines, sequences of events | "The deadline is March 15", "Version 2.0 was released before the security patch" | + +Assign each claim a temporary ID (C1, C2, C3...) for tracking through subsequent layers. + +### Step 2: Check Internal Consistency + +Review the extracted claims against each other: +- Does the text contradict itself anywhere? (e.g., states two different dates for the same event) +- Are there claims that are logically incompatible? +- Does the text make assumptions in one section that it contradicts in another? + +Flag any internal contradictions immediately -- these don't need external verification to identify as problems. + +### Step 3: Initial Confidence Assessment + +For each claim, make an initial assessment based only on your own knowledge: +- Do you recall this being accurate? +- Is this the kind of claim where models frequently hallucinate? (Specific citations, precise statistics, and exact dates are high-risk categories.) +- Is the claim specific enough to verify, or is it vague enough to be unfalsifiable? + +Record your initial confidence but do NOT report it as a finding yet. This is input for Layer 2, not output. + +--- + +## Layer 2: Source Verification + +For each extracted claim, search for external evidence. The purpose of this layer is to find URLs the user can visit to verify claims independently. + +### Search Strategy + +For each claim: + +1. **Formulate a search query** that would surface the primary source. For citations, search for the exact title or case name. For statistics, search for the specific number and topic. For factual claims, search for the key entities and relationships. + +2. **Run the search** using `web_search`. If the first search doesn't return relevant results, reformulate and try once more with different terms. + +3. **Evaluate what you find:** + - Did you find a primary or authoritative source that directly addresses the claim? + - Did you find contradicting information from a credible source? + - Did you find nothing relevant? (This is itself a signal -- real things usually have a web footprint.) + +4. **Record the result** with the source URL. Always provide the URL even if you also summarize what the source says. + +### What Counts as a Source + +Prefer primary and authoritative sources: +- Official documentation, specifications, and standards +- Court records, legislative texts, regulatory filings +- Peer-reviewed publications +- Official organizational websites and press releases +- Established reference works (encyclopedias, legal databases) + +Note when a source is secondary (news article, blog post, wiki page) vs. primary. The user can weigh accordingly. + +### Handling Citations Specifically + +Citations are the highest-risk category for hallucinations. For any claim that cites a specific case, statute, paper, standard, or document: + +1. Search for the exact citation (case name, title, section number). +2. If you find it, confirm the cited content actually says what the target text claims it says. +3. If you cannot find it at all, flag it as FABRICATION RISK. Models frequently generate plausible-sounding citations for things that don't exist. + +--- + +## Layer 3: Adversarial Review + +Switch your posture entirely. In Layers 1 and 2, you were trying to understand and verify the output. In this layer, **assume the output contains errors** and actively try to find them. + +### Hallucination Pattern Checklist + +Check for these common patterns: + +1. **Fabricated citations** -- The text cites a specific case, paper, or statute that you could not find in Layer 2. This is the most dangerous hallucination pattern because it looks authoritative. + +2. **Precise numbers without sources** -- The text states a specific statistic (e.g., "78% of companies...") without indicating where the number comes from. Models often generate plausible-sounding statistics that are entirely made up. + +3. **Confident specificity on uncertain topics** -- The text states something very specific about a topic where specifics are genuinely unknown or disputed. Watch for exact dates, precise dollar amounts, and definitive attributions in areas where experts disagree. + +4. **Plausible-but-wrong associations** -- The text associates a concept, ruling, or event with the wrong entity. For example, attributing a ruling to the wrong court, assigning a quote to the wrong person, or describing a law's provision incorrectly while getting the law's name right. + +5. **Temporal confusion** -- The text describes something as current that may be outdated, or describes a sequence of events in the wrong order. + +6. **Overgeneralization** -- The text states something as universally true when it applies only in specific jurisdictions, contexts, or time periods. Common in legal and regulatory content. + +7. **Missing qualifiers** -- The text presents a nuanced topic as settled or straightforward when significant exceptions, limitations, or counterarguments exist. + +### Adversarial Questions + +For each major claim that passed Layers 1 and 2, ask: +- What would make this claim wrong? +- Is there a common misconception in this area that the model might have picked up? +- If I were a subject matter expert, would I object to how this is stated? +- Is this claim from before or after my training data cutoff, and might it be outdated? + +### Red Flags to Escalate + +If you find any of these, flag them prominently in the report: +- A specific citation that cannot be found anywhere +- A statistic with no identifiable source +- A legal or regulatory claim that contradicts what authoritative sources say +- A claim that has been stated with high confidence but is actually disputed or uncertain + +--- + +## Producing the Verification Report + +After completing all three layers, produce the report using the template in `assets/verification-report-template.md`. + +### Confidence Ratings + +Assign each claim a final rating: + +| Rating | Meaning | What the user should do | +|--------|---------|------------------------| +| **VERIFIED** | Supporting source found and linked | Spot-check the source link if the claim is critical to your work | +| **PLAUSIBLE** | Consistent with general knowledge, no specific source found | Treat as reasonable but unconfirmed; verify independently if relying on it for decisions | +| **UNVERIFIED** | Could not find supporting or contradicting evidence | Do not rely on this claim without independent verification | +| **DISPUTED** | Found contradicting evidence from a credible source | Review the contradicting source; this claim may be wrong | +| **FABRICATION RISK** | Matches hallucination patterns (e.g., unfindable citation, unsourced precise statistic) | Assume this is wrong until you can confirm it from a primary source | + +### Report Principles + +- Provide links, not verdicts. The user decides what's true, not you. +- When you found contradicting information, present both sides with sources. Don't pick a winner. +- If a claim is unfalsifiable (too vague or subjective to verify), say so. "Unfalsifiable" is useful information. +- Be explicit about what you could not check. "I could not verify this" is different from "this is wrong." +- Group findings by severity. Lead with the items that need the most attention. + +### Limitations Disclosure + +Always include this at the end of the report: + +> **Limitations of this verification:** +> - This tool accelerates human verification; it does not replace it. +> - Web search results may not include the most recent information or paywalled sources. +> - The adversarial review uses the same underlying model that may have produced the original output. It catches many issues but cannot catch all of them. +> - A claim rated VERIFIED means a supporting source was found, not that the claim is definitely correct. Sources can be wrong too. +> - Claims rated PLAUSIBLE may still be wrong. The absence of contradicting evidence is not proof of accuracy. + +--- + +## Domain-Specific Guidance + +### Legal Content + +Legal content carries elevated hallucination risk because: +- Case names, citations, and holdings are frequently fabricated by models +- Jurisdictional nuances are often flattened or omitted +- Statutory language may be paraphrased in ways that change the legal meaning +- "Majority rule" and "minority rule" distinctions are often lost + +For legal content, give extra scrutiny to: case citations, statutory references, regulatory interpretations, and jurisdictional claims. Search legal databases when possible. + +### Medical and Scientific Content + +- Check that cited studies actually exist and that the results are accurately described +- Watch for outdated guidelines being presented as current +- Flag dosages, treatment protocols, or diagnostic criteria -- these change and errors can be dangerous + +### Financial and Regulatory Content + +- Verify specific dollar amounts, dates, and thresholds +- Check that regulatory requirements are attributed to the correct jurisdiction and are current +- Watch for tax law claims that may be outdated after recent legislative changes + +### Technical and Security Content + +- Verify CVE numbers, vulnerability descriptions, and affected versions +- Check that API specifications and configuration instructions match current documentation +- Watch for version-specific information that may be outdated diff --git a/skills/doublecheck/assets/verification-report-template.md b/skills/doublecheck/assets/verification-report-template.md new file mode 100644 index 00000000..6d1b4882 --- /dev/null +++ b/skills/doublecheck/assets/verification-report-template.md @@ -0,0 +1,92 @@ +# Verification Report + +## Summary + +**Text verified:** [Brief description of what was checked] +**Claims extracted:** [N total] +**Breakdown:** + +| Rating | Count | +|--------|-------| +| VERIFIED | | +| PLAUSIBLE | | +| UNVERIFIED | | +| DISPUTED | | +| FABRICATION RISK | | + +**Items requiring attention:** [N items rated DISPUTED or FABRICATION RISK] + +--- + +## Flagged Items (Review These First) + +Items rated DISPUTED or FABRICATION RISK. These need your attention before you rely on the source material. + +### [C#] -- [Brief description of the claim] + +- **Claim:** [The specific assertion from the target text] +- **Rating:** [DISPUTED or FABRICATION RISK] +- **Finding:** [What the verification found -- what's wrong or suspicious] +- **Source:** [URL to contradicting or relevant source] +- **Recommendation:** [What the user should do -- e.g., "Verify this citation in Westlaw" or "Remove this statistic unless you can find a primary source"] + +--- + +## All Claims + +Full results for every extracted claim, grouped by confidence rating. + +### VERIFIED + +#### [C#] -- [Brief description] +- **Claim:** [The assertion] +- **Source:** [URL] +- **Notes:** [Any relevant context about the source] + +### PLAUSIBLE + +#### [C#] -- [Brief description] +- **Claim:** [The assertion] +- **Notes:** [Why this is rated plausible rather than verified] + +### UNVERIFIED + +#### [C#] -- [Brief description] +- **Claim:** [The assertion] +- **Notes:** [What was searched, why nothing was found] + +### DISPUTED + +#### [C#] -- [Brief description] +- **Claim:** [The assertion] +- **Contradicting source:** [URL] +- **Details:** [What the source says vs. what the claim says] + +### FABRICATION RISK + +#### [C#] -- [Brief description] +- **Claim:** [The assertion] +- **Pattern:** [Which hallucination pattern this matches] +- **Details:** [Why this is flagged -- e.g., "citation not found in any legal database"] + +--- + +## Internal Consistency + +[Any contradictions found within the target text itself, or "No internal contradictions detected."] + +--- + +## What Was Not Checked + +[List any claims that could not be evaluated -- paywalled sources, claims requiring specialized databases, unfalsifiable assertions, etc.] + +--- + +## Limitations + +- This tool accelerates human verification; it does not replace it. +- Web search results may not include the most recent information or paywalled sources. +- The adversarial review uses the same underlying model that may have produced the original output. It catches many issues but cannot catch all of them. +- A claim rated VERIFIED means a supporting source was found, not that the claim is definitely correct. Sources can be wrong too. +- Claims rated PLAUSIBLE may still be wrong. The absence of contradicting evidence is not proof of accuracy.