# Doublecheck A three-layer verification pipeline for AI-generated output. Extracts verifiable claims, finds sources via web search, runs adversarial review for hallucination patterns, and produces a structured report with source links so humans can verify before acting. ## Why This Exists AI hallucinations are a model-level problem. No plugin can fix them. But the *consequences* of hallucinations -- acting on fabricated citations, relying on made-up statistics, citing nonexistent case law -- can be mitigated by making verification fast and structured. Doublecheck doesn't tell you what's true. It extracts every verifiable claim from AI output, searches for sources you can check independently, and flags anything that matches known hallucination patterns. You make the final call. ## What's Included | Component | Type | Description | |-----------|------|-------------| | `doublecheck` | Skill | The core verification pipeline. Runs three layers and produces a structured report. | | `Doublecheck` | Agent | Interactive verification mode for follow-up questions and deeper investigation. | ## The Three Layers **Layer 1: Self-Audit.** Re-reads the target text critically. Extracts every verifiable claim (facts, statistics, citations, dates, causal assertions). Checks for internal contradictions. Categorizes claims for downstream verification. **Layer 2: Source Verification.** For each extracted claim, runs web searches to find supporting or contradicting evidence. Produces clickable URLs for independent human review. Gives extra scrutiny to citations, which are the highest-risk category for hallucinations. **Layer 3: Adversarial Review.** Switches posture entirely -- assumes the output contains errors and actively tries to find them. Checks against a hallucination pattern checklist: fabricated citations, unsourced statistics, confident specificity on uncertain topics, temporal confusion, overgeneralization, and missing qualifiers. ## Confidence Ratings Each claim gets a final rating: | Rating | Meaning | |--------|---------| | VERIFIED | Supporting source found and linked | | PLAUSIBLE | Consistent with general knowledge, no specific source found | | UNVERIFIED | Could not find supporting or contradicting evidence | | DISPUTED | Contradicting evidence found from a credible source | | FABRICATION RISK | Matches hallucination patterns (e.g., citation that can't be found anywhere) | ## Usage ### Persistent Mode ("Always On") Activate doublecheck mode and it stays on for the rest of your conversation. Every substantive response from Copilot will include an inline verification summary at the bottom -- confidence ratings and source links for each factual claim. To activate, just say: ``` use doublecheck ``` Once active: - Simple factual lookups and single-claim answers get automatic inline verification summaries - Factual analysis, legal analysis, regulatory interpretation, compliance guidance, and content with case citations or statutory references automatically get the full verification report instead of inline summaries - If any claim rates DISPUTED or FABRICATION RISK during inline verification, the full report is generated automatically - Code, creative writing, and casual conversation are skipped (verification doesn't apply) - You can ask for a full deep-dive verification on any response by saying "full report" (or the legacy phrase "run full verification") - Turn it off anytime with "turn off doublecheck" Turn it off anytime: ``` turn off doublecheck ``` This is the recommended mode for working sessions where accuracy matters -- legal research, compliance analysis, regulatory guidance, executive briefings. ### One-Shot Verification If you don't want persistent mode, you can verify specific text on demand: ``` use doublecheck to verify: [paste the text you want checked] ``` This runs the full three-layer pipeline and produces a detailed verification report with every claim extracted, rated, and sourced. ### Interactive Agent Mode For a conversational back-and-forth: ``` @doublecheck [paste text or describe what you want verified] ``` The agent mode lets you: - Get the full verification report - Ask follow-up questions about specific flagged claims - Request deeper investigation ("dig deeper on C3") - Get help evaluating whether a source is credible ### When to Use It - Before acting on legal analysis, case citations, or regulatory guidance generated by AI - Before including AI-generated statistics or data points in documents - When reviewing AI output that will be shared with clients, leadership, or external parties - When working in domains where errors carry real consequences (legal, medical, financial, security) - Anytime you think "I should probably double-check this" ### When NOT to Use It - For creative or subjective content where "accuracy" isn't the goal - For code review (use code-specific review tools instead) - As a substitute for subject matter expertise -- the tool helps you verify faster, it doesn't replace knowing the domain ## Limitations Be aware of what this tool cannot do: - **Same model, same biases.** The verification pipeline uses the same type of model that may have produced the original output. It catches many issues -- particularly structural patterns like missing citations -- but it has the same fundamental knowledge limitations. - **Web search is not comprehensive.** Paywalled content, recently published material, and niche databases may not appear in search results. A claim being "unverified" may mean it's behind a paywall, not that it's wrong. - **VERIFIED means "source found," not "definitely correct."** Sources themselves can be wrong, outdated, or misinterpreted. A supporting link accelerates your verification process; it doesn't complete it. - **The tool cannot catch what it doesn't know it doesn't know.** If a hallucination is sophisticated enough to pass all three layers, a human expert is your last line of defense. The honest framing: this tool raises the floor on verification quality and dramatically reduces the time it takes to identify the claims that need human attention. It does not raise the ceiling. Critical decisions should always involve human domain expertise.