new hook fix-broken-links (#2027)

* new hook fix-broken-links

* codespell: add ans for variable short for answer

* update: scripts hand off alternative url to copilot cmd

* codespell: add ext. arcade-canvas/game/phaser.min.js

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* update: rm em-dash from hook scripts

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
John Haugabook
2026-06-17 21:00:15 -04:00
committed by GitHub
parent 18654630ab
commit 373a548daf
6 changed files with 977 additions and 1 deletions
+3 -1
View File
@@ -54,7 +54,9 @@
# CAF - Microsoft Cloud Adoption Framework acronym
ignore-words-list = numer,wit,aks,edn,ser,ois,gir,rouge,categor,aline,ative,afterall,deques,dateA,dateB,TE,FillIn,alle,vai,LOD,InOut,pixelX,aNULL,Wee,Sherif,queston,Vertexes,nin,FO,CAF,Parth
# ans - bash and powershell variable short for answer
ignore-words-list = numer,wit,aks,edn,ser,ois,gir,rouge,categor,aline,ative,afterall,deques,dateA,dateB,TE,FillIn,alle,vai,LOD,InOut,pixelX,aNULL,Wee,Sherif,queston,Vertexes,nin,FO,CAF,Parth,ans
# Skip certain files and directories
+1
View File
@@ -32,6 +32,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-hooks) for guidelines on how to
| Name | Description | Events | Bundled Assets |
| ---- | ----------- | ------ | -------------- |
| [Dependency License Checker](../hooks/dependency-license-checker/README.md) | Scans newly added dependencies for license compliance (GPL, AGPL, etc.) at session end | sessionEnd | `check-licenses.sh`<br />`hooks.json` |
| [Fix Broken Links](../hooks/fix-broken-links/README.md) | Checks changed web files for broken hyperlinks and SEO anchor issues after each Copilot tool use. | postToolUse | `hooks.json`<br />`link-fix.ps1`<br />`link-fix.sh` |
| [Governance Audit](../hooks/governance-audit/README.md) | Scans Copilot agent prompts for threat signals and logs governance events | sessionStart, sessionEnd, userPromptSubmitted | `audit-prompt.sh`<br />`audit-session-end.sh`<br />`audit-session-start.sh`<br />`hooks.json` |
| [Secrets Scanner](../hooks/secrets-scanner/README.md) | Scans files modified during a Copilot coding agent session for leaked secrets, credentials, and sensitive data | sessionEnd | `hooks.json`<br />`scan-secrets.sh` |
| [Session Auto-Commit](../hooks/session-auto-commit/README.md) | Automatically commits and pushes changes when a Copilot coding agent session ends | sessionEnd | `auto-commit.sh`<br />`hooks.json` |
+177
View File
@@ -0,0 +1,177 @@
---
name: 'Fix Broken Links'
description: 'Checks changed web files for broken hyperlinks and SEO anchor issues after each Copilot tool use.'
tags: ['links', 'seo', 'html', 'markdown', 'post-tool-use']
---
# Fix Broken Links Hook
Scans recently-changed web files for broken hyperlinks after each GitHub Copilot
tool use. For each broken URL the hook tries common spelling variations, then hands
the link to the Copilot CLI agent for suggested replacements, and presents an
interactive fix menu. Generic anchor text (`click here`, `read more`, etc.) is
flagged as an SEO issue.
## Overview
Broken links accumulate silently in web projects. Running on the `postToolUse`
event, this hook checks the web files the agent just edited — and only those —
right after each change, so you can fix, replace, or remove each broken link in
the same terminal session.
The hook has two modes:
- **With file paths** (the edited files injected from the hook payload, or paths
passed on the command line): it checks each link, looks up replacement
candidates, and presents the interactive fix menu.
- **With no file arguments**: it simply lists the broken links it finds — no
replacement lookups and no prompts.
## Features
- **Self-contained core**: bash and PowerShell ports — no runtime to install (the optional agent
hand-off reuses the Copilot CLI you already have)
- **Edited-files scope**: as a `postToolUse` hook it only checks the files the agent just changed —
never a full repo scan
- **Format-agnostic link scan**: extracts every `http(s)` URL with `grep`, covering HTML, Markdown,
JS/TS, JSON, CSS, SQL, and templates at once
- **Automatic URL healing**: tries www, https, and trailing-slash variations
- **Agent-assisted suggestions**: hands the broken link to the Copilot CLI agent (a lightweight,
low-token `gpt-5-mini` prompt with no tools) for replacement candidates; if the CLI is missing or
errors, it simply offers none
- **SEO audit**: flags anchor text that is too generic to benefit search ranking
- **Large-file guard**: prompts before checking files with more than 50 links
- **Interactive fix menu**: replace with suggestion, enter custom URL, strip tag keeping text, or
skip
- **Standard tools only**: `curl`, `grep`, `sed` — present on any POSIX system
## Installation
1. Copy the hook folder to your repository:
```bash
cp -r hooks/fix-broken-links .github/hooks/
```
2. Make the script executable:
```bash
chmod +x .github/hooks/fix-broken-links/link-fix.sh
```
3. Commit the hook configuration to your repository's default branch.
## Configuration
The hook is configured in `hooks.json` to run on the `postToolUse` event:
```json
{
"version": 1,
"hooks": {
"postToolUse": [
{
"type": "command",
"bash": ".github/hooks/fix-broken-links/link-fix.sh",
"powershell": ".github/hooks/fix-broken-links/link-fix.ps1",
"cwd": ".",
"timeoutSec": 120
}
]
}
}
```
## Supported Source Types
Links are found by scanning each file for `http(s)://` URLs, so the same logic
covers every format that embeds absolute URLs:
| Source | Examples matched |
| --- | --- |
| HTML | `<a href>`, `<img src>`, `<script src>`, `<link href>`, `<iframe src>` |
| Markdown | `[text](url)`, `[text][ref]`, bare `<url>` |
| JS / TS / Vue / Svelte | `fetch()`, `XMLHttpRequest.open()`, jQuery, axios, `href:`/`url:` props |
| JSON / JSONL | any string value that is an absolute URL |
| CSS | `url(...)` |
| SQL | URL literals in query strings |
| Templates | Jinja2, ERB, EJS, Handlebars, Pug |
The `d` (remove) action understands HTML `<a>` wrappers and Markdown `[text](url)`
links specifically, keeping the visible text. Other source types support
`r` (replace) and `c` (custom) via literal URL substitution.
## Fix Options
For each broken link:
| Key | Action |
| --- | --- |
| `r` | Replace with the suggested URL (a working variation, or an agent-proposed alternative) |
| `d` | Strip the link wrapper, keeping the visible text as plain text |
| `c` | Enter a custom replacement URL |
| `s` | Skip |
## Example Output
```text
Checking 2 link(s) in docs/guide.md ...
BROKEN (404) https://example.com/old-page
------------------------------------------------------------
SEO anchor issues (consider descriptive link text)
docs/guide.md: <a href="https://example.com/old-page">click here</a>
============================================================
fix-broken-links report
============================================================
[1] docs/guide.md
URL : https://example.com/old-page
HTTP: 404
r Replace -> https://example.com/docs/install
1 Replace -> https://example.com/docs/getting-started
d Remove link, keep text
c Custom replacement URL
s Skip
> r
replaced
1 file(s) updated:
docs/guide.md
```
With no file arguments (or when the edited file carries no checkable links) the
hook stops after the broken-link list — the menu above is skipped.
## Requirements
- `curl` — HTTP status checks (the hook exits quietly if absent)
- `grep`, `sed` — link extraction (standard on any POSIX system)
- `jq` — required by the bash hook to parse the postToolUse JSON payload and discover edited files
- Bash 4+ (for `link-fix.sh`); on Windows use Git Bash or WSL, or run the PowerShell 7+ port
`link-fix.ps1`
- `copilot` (GitHub Copilot CLI) — optional; powers the agent-suggested replacements. Without it,
only verified spelling variations are offered
- `git` is used for changed-file discovery; the hook falls back to a full repo scan without it
## File Structure
```
.github/hooks/fix-broken-links/
├── hooks.json GitHub Copilot hook configuration
├── link-fix.sh Bash hook implementation
├── link-fix.ps1 PowerShell 7+ port
└── README.md This file
```
## Limitations
- Only checks absolute `http://` and `https://` URLs; relative paths require a running server
- Dynamic links generated at runtime from database queries are not detectable from source alone
- When `copilot` suggestions are enabled, broken URLs are sent to the Copilot service as prompt input
- Agent-suggested replacements are model proposals and are not verified live; confirm each before
accepting
- The `d` (remove) action targets HTML and Markdown link syntax; bare URLs in code are best handled
with `r` or `c`
+14
View File
@@ -0,0 +1,14 @@
{
"version": 1,
"hooks": {
"postToolUse": [
{
"type": "command",
"bash": ".github/hooks/fix-broken-links/link-fix.sh",
"powershell": ".github/hooks/fix-broken-links/link-fix.ps1",
"cwd": ".",
"timeoutSec": 120
}
]
}
}
+410
View File
@@ -0,0 +1,410 @@
#!/usr/bin/env pwsh
# fix-broken-links - link-fix.ps1 (PowerShell 7+ port of link-fix.sh)
#
# After the agent edits files (postToolUse): take the files it just changed,
# extract every http(s) URL, and check each one.
# • With file paths passed (the edited files, injected from the hook payload, or
# given on the command line) any URL that is not 200 gets spelling variations
# (http/https, www, trailing slash) then a Copilot CLI agent hand-off for more
# alternatives, followed by an interactive menu to replace / remove / skip.
# • With NO file arguments it only lists the broken links - no alternative
# lookups and no prompts.
# Generic anchor text is flagged as an SEO note either way.
#
# Pure PowerShell + .NET (Invoke-WebRequest/regex), plus an optional Copilot CLI
# hand-off for suggestions.
# Covers: HTML · Markdown · JS/TS · JSON · CSS · SQL · templates (all via URL scan)
# Trigger: postToolUse
Set-StrictMode -Off
$ProgressPreference = 'SilentlyContinue' # Invoke-WebRequest is far faster without the bar
# The agent hand-off below invokes `copilot`, which may itself re-fire this hook.
# The child run is marked with this env var; exit immediately if it is present so
# we never recurse.
if ($env:FIX_BROKEN_LINKS_AGENT) { exit 0 }
$LIMIT = 50
$TIMEOUT = 10
$UA = 'Mozilla/5.0 (compatible; fix-broken-links/1.0)'
$AGENT_MODEL = 'gpt-5-mini' # small, low-token model for the suggestion hand-off
$AGENT_TIMEOUT = 60 # seconds before giving up on the agent
$WEB_RE = '\.(html?|xhtml|md|markdown|mdx|js|jsx|ts|tsx|vue|svelte|json|jsonl|css|sql|erb|jinja|j2|twig|ejs|pug|hbs)$'
# Positional args become the file list; the hook payload can also supply them.
$ScriptArgs = [System.Collections.Generic.List[string]]::new()
foreach ($a in $args) { [void]$ScriptArgs.Add([string]$a) }
# ── Hook stdin ────────────────────────────────────────────────────────────────
# When called as a postToolUse hook, extract edited files from the JSON payload
# and inject them as positional args so Get-InputFiles picks them up.
$IsHook = $false
if ($ScriptArgs.Count -eq 0 -and [Console]::IsInputRedirected) {
$IsHook = $true # invoked as a hook: stdin carries the tool payload
$raw = [Console]::In.ReadToEnd()
if ($raw.Trim()) {
try {
$json = $raw | ConvertFrom-Json
$tool = $json.toolName; if (-not $tool) { $tool = $json.tool_name }
if ($tool) {
if ($tool -in 'editFiles','edit','write','str_replace_editor','create_file','multiEdit','applyPatch') {
# Only the files this edit tool just changed - never a wider repo scan.
$hookFiles = $json.tool_input.files; if (-not $hookFiles) { $hookFiles = $json.toolInput.files }
if (-not $hookFiles) { $hookFiles = $json.tool_input.path; if (-not $hookFiles) { $hookFiles = $json.toolInput.path } }
if ($hookFiles) { foreach ($hf in $hookFiles) { [void]$ScriptArgs.Add([string]$hf) } }
}
else {
# Different tool (bash, read, etc.) - nothing to check
exit 0
}
}
# No tool context - called manually with piped input, fall through
} catch { }
}
}
# A non-empty positional list means the caller passed files: the edited files from
# the hook payload above, or paths given on the command line. Only then do we run
# the full repair flow (look up alternatives, then prompt to fix). With no
# parameters we simply list the broken links - no lookups, no prompts.
$HaveParams = $ScriptArgs.Count -gt 0
# Interactive prompts are only possible when input is a real console; once the
# hook JSON has been read from a redirected stdin we report rather than prompt.
$Interactive = [Environment]::UserInteractive -and -not [Console]::IsInputRedirected
function Read-Answer {
param([string]$Prompt)
if (-not $Interactive) { return '' }
[Console]::Out.Write($Prompt)
$ans = [Console]::In.ReadLine()
if ($null -eq $ans) { return '' }
return $ans
}
# ── Helpers ───────────────────────────────────────────────────────────────────
function Get-HttpStatus {
param([string]$Url)
try {
$resp = Invoke-WebRequest -Uri $Url -MaximumRedirection 5 -TimeoutSec $TIMEOUT `
-UserAgent $UA -ErrorAction Stop
return [string][int]$resp.StatusCode
} catch {
$resp = $_.Exception.Response
if ($resp -and $resp.StatusCode) { return [string][int]$resp.StatusCode }
return 'ERR'
}
}
# Split a URL into scheme/host/path the same way the bash port does (string ops,
# not [uri], so wildcards and odd paths survive intact).
function Split-Url {
param([string]$Url)
$scheme = ($Url -split '://',2)[0]
$rest = $Url -replace '^[a-zA-Z][a-zA-Z0-9+.-]*://',''
$hostName = ($rest -split '/',2)[0]
if ($rest -eq $hostName) { $path = '' } else { $path = '/' + ($rest -split '/',2)[1] }
[pscustomobject]@{ Scheme = $scheme; Host = $hostName; Path = $path }
}
# Every http(s) URL in a file, trailing punctuation trimmed, de-duplicated.
function Get-Urls {
param([string]$File)
$text = [System.IO.File]::ReadAllText($File)
[regex]::Matches($text, 'https?://[^"''<> )]+', 'IgnoreCase') |
ForEach-Object { $_.Value -replace '[.,;:]+$','' } |
Sort-Object -Unique
}
# Generic anchor text that weakens SEO.
function Get-SeoIssues {
param([string]$File)
$text = [System.IO.File]::ReadAllText($File)
$reA = '<a[^>]*>\s*(click here|click|here|read more|more|this page|this|learn more|see more|view|visit|details|info)\s*</a>'
$reB = '\[(click here|click|here|read more|more|this page|learn more|see more|details|info)\]\('
@([regex]::Matches($text, $reA, 'IgnoreCase')) +
@([regex]::Matches($text, $reB, 'IgnoreCase')) | ForEach-Object { $_.Value }
}
# Try common URL variations; return the first that returns 200, else ''.
function Find-Variation {
param([string]$Url)
$p = Split-Url $Url
$scheme = $p.Scheme; $hostName = $p.Host; $path = $p.Path
$cands = [System.Collections.Generic.List[string]]::new()
if ($scheme -eq 'http') { [void]$cands.Add("https://$hostName$path") }
if ($scheme -eq 'https') { [void]$cands.Add("http://$hostName$path") }
if ($hostName -like 'www.*') { [void]$cands.Add("$scheme`://$($hostName.Substring(4))$path") }
else { [void]$cands.Add("$scheme`://www.$hostName$path") }
if ($path -and $path -notmatch '/$' -and (($path -split '/')[-1]) -notmatch '\.') {
[void]$cands.Add(($Url -replace '/$','') + '/')
}
foreach ($c in $cands) {
if ($c -eq $Url) { continue }
if ((Get-HttpStatus $c) -eq '200') { return $c }
}
return ''
}
# Hand the broken link to the Copilot CLI agent and let it propose alternatives.
# A deliberately lightweight, low-token hand-off: one non-interactive prompt to a
# small model with no tools enabled (so it answers from its own knowledge - no web
# fetches, no permission prompts, no archive lookups on our side). The model may
# prefix a prose line, so we pull http(s) tokens from anywhere in the output, trim
# trailing punctuation, drop the broken URL itself, and de-duplicate. The call runs
# as a job so it can be capped at $AGENT_TIMEOUT seconds.
function Get-AgentAlts {
param([string]$Url,[int]$Max)
if (-not (Get-Command copilot -ErrorAction SilentlyContinue)) { return @() }
$snappy = $AGENT_TIMEOUT - 5
$prompt = "In under $snappy seconds, find up to $Max working alternative URLs for the broken link $Url. Hierarchically consider 1. Path and/or page spelling; 2. web.archive.org/wayback; 3. Redirects using redirect destination; 4. The context of the link's text; in order to resolve. Output only the URLs. One per line, and no: prose, numbering, markdown, backticks, special characters, post formatting."
$out = ''
try {
# FIX_BROKEN_LINKS_AGENT marks the child run so a re-entrant hook exits early.
$job = Start-Job -ScriptBlock {
param($Prompt, $Model)
$env:FIX_BROKEN_LINKS_AGENT = '1'
copilot -p $Prompt -s --no-color --model $Model --available-tools 2>$null
} -ArgumentList $prompt, $AGENT_MODEL
# Only read output from a job that completed cleanly; a failed/errored copilot
# run yields no alternatives.
if ((Wait-Job $job -Timeout $AGENT_TIMEOUT) -and $job.State -eq 'Completed') {
$out = (Receive-Job $job -ErrorAction SilentlyContinue | Out-String)
}
Remove-Job $job -Force -ErrorAction SilentlyContinue
} catch { $out = '' }
if (-not $out) { return @() }
$seen = @{}
$result = [System.Collections.Generic.List[string]]::new()
foreach ($m in [regex]::Matches($out, 'https?://[^\s"''<>)\]]+', 'IgnoreCase')) {
if ($result.Count -ge $Max) { break }
$u = $m.Value -replace '[.,;:]+$',''
$key = $u.ToLower()
if ($key -eq $Url.ToLower()) { continue }
if ($seen.ContainsKey($key)) { continue }
$seen[$key] = $true
[void]$result.Add($u)
}
return ,$result.ToArray()
}
# Up to MAX viable replacement URLs for a broken link, best first:
# 1. a working scheme/www/slash variation (verified live 200)
# 2. alternatives proposed by the Copilot CLI agent (see Get-AgentAlts)
# De-duplicated case-insensitively. The first item is what `r` uses; the rest
# become the numbered alternatives.
function Get-SuggestedAlts {
param([string]$Url,[int]$Max = 6)
$seen = @{}
$out = [System.Collections.Generic.List[string]]::new()
$v = Find-Variation $Url
if ($v) { [void]$out.Add($v); $seen[$v.ToLower()] = $true }
foreach ($a in (Get-AgentAlts $Url $Max)) {
if ($out.Count -ge $Max) { break }
if (-not $a) { continue }
$key = $a.ToLower()
if ($seen.ContainsKey($key)) { continue }
[void]$out.Add($a); $seen[$key] = $true
}
return ,$out.ToArray()
}
# Replace a literal URL everywhere in a file (plain string replace, no regex).
function Set-UrlReplacement {
param([string]$File,[string]$Old,[string]$New)
$content = [System.IO.File]::ReadAllText($File)
[System.IO.File]::WriteAllText($File, $content.Replace($Old, $New))
}
# Remove the link wrapper but keep the visible text:
# <a href="URL">text</a> -> text
# [text](URL) -> text
function Remove-LinkWrapper {
param([string]$File,[string]$Url)
$content = [System.IO.File]::ReadAllText($File)
$esc = [regex]::Escape($Url)
# Each element is parenthesized: the comma operator binds tighter than '+', so
# without the parens the three concatenations collapse into a single string and
# the array would hold one bogus pattern instead of three.
$patterns = @(
('<a[^>]*href="' + $esc + '"[^>]*>([^<]*)</a>'),
("<a[^>]*href='" + $esc + "'[^>]*>([^<]*)</a>"),
('\[([^\]]*)\]\(' + $esc + '[^)]*\)')
)
foreach ($pat in $patterns) {
$content = [regex]::Replace($content, $pat, '$1', 'IgnoreCase')
}
[System.IO.File]::WriteAllText($File, $content)
}
# ── File discovery ────────────────────────────────────────────────────────────
function Get-InputFiles {
if ($ScriptArgs.Count -gt 0) { return $ScriptArgs.ToArray() }
# Fired as a hook but the payload carried no (web) files: do nothing rather than
# fall back to scanning unrelated files - the hook only ever checks edited files.
if ($IsHook) { return @() }
$out = @()
if (Get-Command git -ErrorAction SilentlyContinue) {
git rev-parse --git-dir *> $null
if ($LASTEXITCODE -eq 0) {
$out = @(git diff --name-only HEAD 2>$null) + @(git diff --name-only --cached 2>$null)
}
}
if ($out.Count -gt 0) { return $out }
Get-ChildItem -Recurse -File -ErrorAction SilentlyContinue |
Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|dist|build|\.next|\.venv|__pycache__)[\\/]' } |
ForEach-Object { Resolve-Path -Relative -LiteralPath $_.FullName }
}
$seenFiles = @{}
$FILES = [System.Collections.Generic.List[string]]::new()
foreach ($f in (Get-InputFiles)) {
if (-not $f) { continue }
$f = ([string]$f).Trim()
if (-not (Test-Path -LiteralPath $f -PathType Leaf)) { continue }
if ($f -match '[\\/](node_modules|\.git|dist|build)[\\/]') { continue }
if ($f -notmatch $WEB_RE) { continue }
if ($seenFiles.ContainsKey($f)) { continue }
$seenFiles[$f] = $true
[void]$FILES.Add($f)
}
if ($FILES.Count -eq 0) { exit 0 }
# ── Scan ──────────────────────────────────────────────────────────────────────
$B_FILE = [System.Collections.Generic.List[string]]::new()
$B_URL = [System.Collections.Generic.List[string]]::new()
$B_STATUS = [System.Collections.Generic.List[string]]::new()
$B_ALT = [System.Collections.Generic.List[object]]::new()
$SEO_LINES = [System.Collections.Generic.List[string]]::new()
foreach ($file in $FILES) {
foreach ($line in (Get-SeoIssues $file)) {
if ($line) { [void]$SEO_LINES.Add("${file}: $line") }
}
$urls = @(Get-Urls $file)
if ($urls.Count -eq 0) { continue }
if ($HaveParams -and $urls.Count -gt $LIMIT) {
$ans = Read-Answer " $file has $($urls.Count) links (limit $LIMIT). Continue? [Y/n] "
if ($ans -in 'n','N','no','NO') { continue }
}
Write-Host ""
Write-Host " Checking $($urls.Count) link(s) in $file ..."
foreach ($url in $urls) {
$status = Get-HttpStatus $url
if ($status -eq '200') { continue }
Write-Host " BROKEN ($status) $url"
# Only look up replacements when files were passed; otherwise just list.
$alts = @()
if ($HaveParams) { $alts = Get-SuggestedAlts $url 6 }
[void]$B_FILE.Add($file)
[void]$B_URL.Add($url)
[void]$B_STATUS.Add($status)
[void]$B_ALT.Add($alts)
}
}
# ── SEO report ────────────────────────────────────────────────────────────────
if ($SEO_LINES.Count -gt 0) {
Write-Host ""
Write-Host "------------------------------------------------------------"
Write-Host " SEO anchor issues (consider descriptive link text)"
foreach ($s in $SEO_LINES) { Write-Host " $s" }
}
if ($B_URL.Count -eq 0) {
Write-Host ""
Write-Host " No broken links found."
Write-Host ""
exit 0
}
# ── Interactive fix ───────────────────────────────────────────────────────────
Write-Host ""
Write-Host "============================================================"
Write-Host " fix-broken-links report"
Write-Host "============================================================"
$CHANGED = @{}
$n = $B_URL.Count
for ($i = 0; $i -lt $n; $i++) {
$file = $B_FILE[$i]
$url = $B_URL[$i]
$status = $B_STATUS[$i]
$alts = @($B_ALT[$i])
Write-Host ""
Write-Host " [$($i + 1)] $file"
Write-Host " URL : $url"
$note = ''
if ($status -in 'ERR','000','TIMEOUT') { $note = ' (unreachable)' }
Write-Host " HTTP: $status$note"
# No file parameters → report-only: list the broken link and move on.
if (-not $HaveParams) { continue }
Write-Host ""
if ($alts.Count -gt 0) {
Write-Host " r Replace -> $($alts[0])"
for ($k = 1; $k -lt $alts.Count; $k++) {
Write-Host " $k Replace -> $($alts[$k])"
}
}
Write-Host " d Remove link, keep text"
Write-Host " c Custom replacement URL"
Write-Host " s Skip"
if (-not $Interactive) {
Write-Host " (no terminal - reporting only)"
continue
}
while ($true) {
$ch = Read-Answer ' > '
if ($ch -eq 's' -or $ch -eq '') { break }
elseif ($ch -eq 'd') {
Remove-LinkWrapper $file $url; $CHANGED[$file] = $true; Write-Host " removed"; break
}
elseif ($ch -eq 'r') {
if ($alts.Count -gt 0) {
Set-UrlReplacement $file $url $alts[0]; $CHANGED[$file] = $true
Write-Host " replaced -> $($alts[0])"; break
}
Write-Host " no suggestion available"
}
elseif ($ch -match '^[1-9]$') {
$idx = [int]$ch
if ($idx -lt $alts.Count) {
Set-UrlReplacement $file $url $alts[$idx]; $CHANGED[$file] = $true
Write-Host " replaced -> $($alts[$idx])"; break
}
Write-Host " invalid choice"
}
elseif ($ch -eq 'c') {
$u = Read-Answer ' URL: '
if ($u) { Set-UrlReplacement $file $url $u; $CHANGED[$file] = $true; Write-Host " replaced"; break }
}
else {
Write-Host " invalid choice"
}
}
}
if ($CHANGED.Count -gt 0) {
Write-Host ""
Write-Host " $($CHANGED.Count) file(s) updated:"
foreach ($f in $CHANGED.Keys) { Write-Host " $f" }
Write-Host ""
}
exit 0
+372
View File
@@ -0,0 +1,372 @@
#!/bin/bash
# fix-broken-links - link-fix.sh
#
# After the agent edits files (postToolUse): take the files it just changed,
# extract every http(s) URL, and check each with curl.
# • With file paths passed (the edited files, injected from the hook payload, or
# given on the command line) any URL that is not 200 gets spelling variations
# (http/https, www, trailing slash) then a Copilot CLI agent hand-off for more
# alternatives, followed by an interactive menu to replace / remove / skip.
# • With NO file arguments it only lists the broken links - no alternative
# lookups and no prompts.
# Generic anchor text is flagged as an SEO note either way.
#
# Pure bash + grep/sed/curl, plus an optional Copilot CLI hand-off for suggestions.
# Covers: HTML · Markdown · JS/TS · JSON · CSS · SQL · templates (all via URL scan)
# Requires: curl, grep, sed | Optional: copilot | Trigger: postToolUse
set -uo pipefail
# The agent hand-off below invokes `copilot`, which may itself re-fire this hook.
# The child run is marked with this env var; exit immediately if it is present so
# we never recurse.
[ -n "${FIX_BROKEN_LINKS_AGENT:-}" ] && exit 0
LIMIT=50
TIMEOUT=10
UA='Mozilla/5.0 (compatible; fix-broken-links/1.0)'
AGENT_MODEL='gpt-5-mini' # small, low-token model for the suggestion hand-off
AGENT_TIMEOUT=60 # seconds before giving up on the agent
# Cap the agent call with `timeout` when it is available (coreutils; absent on
# some minimal / Git-Bash setups), otherwise run copilot unbounded.
if command -v timeout >/dev/null 2>&1; then AGENT_RUN="timeout ${AGENT_TIMEOUT}"; else AGENT_RUN=""; fi
WEB_RE='\.(html?|xhtml|md|markdown|mdx|js|jsx|ts|tsx|vue|svelte|json|jsonl|css|sql|erb|jinja|j2|twig|ejs|pug|hbs)$'
command -v curl >/dev/null 2>&1 || { printf 'fix-broken-links: curl not found\n' >&2; exit 0; }
# ── Hook stdin ────────────────────────────────────────────────────────────────
# When called as a postToolUse hook, extract edited files from the JSON payload
# and inject them as positional args so collect_input picks them up.
_HOOK=""
if [ "$#" -eq 0 ] && [ ! -t 0 ]; then
_HOOK=1 # invoked as a hook: stdin carries the tool payload
_INPUT=$(cat)
if command -v jq >/dev/null 2>&1; then
_TOOL=$(printf '%s' "$_INPUT" | jq -r '.toolName // .tool_name // empty' 2>/dev/null)
case "$_TOOL" in
editFiles|edit|write|str_replace_editor|create_file|multiEdit|applyPatch)
# Only the files this edit tool just changed - never a wider repo scan.
mapfile -t _FILES < <(
printf '%s' "$_INPUT" \
| jq -r '.tool_input.files[]? // .toolInput.files[]? // .tool_input.path // .toolInput.path // empty' 2>/dev/null
)
[ "${#_FILES[@]}" -gt 0 ] && set -- "${_FILES[@]}"
;;
"")
# No tool context - called manually with piped input, fall through
;;
*)
# Different tool (bash, read, etc.) - nothing to check
exit 0
;;
esac
fi
fi
# A non-empty positional list means the caller passed files: the edited files from
# the hook payload above, or paths given on the command line. Only then do we run
# the full repair flow (look up alternatives, then prompt to fix). With no
# parameters we simply list the broken links - no lookups, no prompts.
[ "$#" -gt 0 ] && HAVE_PARAMS=1 || HAVE_PARAMS=0
# Interactive input comes from the terminal, since stdin may carry hook JSON.
# Probe by actually opening /dev/tty - a mere -r/-w test can pass where open fails.
TTY=/dev/tty
if { true >/dev/tty; } 2>/dev/null && { true </dev/tty; } 2>/dev/null; then
TTY=/dev/tty
else
TTY=""
fi
ask() {
local p="$1" ans=""
[ -z "$TTY" ] && { printf '%s' ""; return; }
printf '%s' "$p" > "$TTY"
IFS= read -r ans < "$TTY" || ans=""
printf '%s' "$ans"
}
# ── Helpers ───────────────────────────────────────────────────────────────────
http_status() {
curl -s -o /dev/null -w '%{http_code}' --max-time "$TIMEOUT" --location -A "$UA" "$1" 2>/dev/null
}
# Escape ERE metacharacters so a literal string can be used safely inside a bash
# [[ =~ ]] pattern. Only true metacharacters are escaped - backslash-escaping an
# ordinary character (e.g. '\:') is undefined in ERE and would fail to match.
re_escape() {
local s="$1" out="" c i bs='\' meta='.^$*+?()[]{}|\'
for ((i = 0; i < ${#s}; i++)); do
c="${s:i:1}"
if [[ "$meta" == *"$c"* ]]; then out+="$bs$c"; else out+="$c"; fi
done
printf '%s' "$out"
}
# Read an entire file into a variable, preserving newlines.
read_file() { IFS= read -rd '' "$1" < "$2" || true; }
# Escape glob metacharacters (\ * ? [) so a string is matched literally inside
# ${var//pattern/repl}, which otherwise interprets the pattern as a glob. URLs
# and Markdown link spans routinely contain ? and [ ], so this is required for a
# correct fixed-string replacement.
glob_escape() {
local s="$1" out="" c i
for ((i = 0; i < ${#s}; i++)); do
c="${s:i:1}"
case "$c" in
'\'|'*'|'?'|'[') out+="\\$c" ;;
*) out+="$c" ;;
esac
done
printf '%s' "$out"
}
# Print every http(s) URL in a file, trailing punctuation trimmed, de-duplicated.
extract_urls() {
grep -oiE 'https?://[^"'\''<> )]+' "$1" 2>/dev/null \
| sed -E 's/[.,;:]+$//' \
| sort -u
}
# Generic anchor text that weakens SEO.
seo_scan() {
grep -oiE '<a[^>]*>[[:space:]]*(click here|click|here|read more|more|this page|this|learn more|see more|view|visit|details|info)[[:space:]]*</a>' "$1" 2>/dev/null
grep -oiE '\[(click here|click|here|read more|more|this page|learn more|see more|details|info)\]\(' "$1" 2>/dev/null
}
# Try common URL variations; echo the first that returns 200, else nothing.
find_variation() {
local url="$1" scheme rest host path cand
scheme="${url%%://*}"
rest="${url#*://}"
host="${rest%%/*}"
if [ "$rest" = "$host" ]; then path=""; else path="/${rest#*/}"; fi
local cands=()
case "$scheme" in
http) cands+=("https://${host}${path}") ;;
https) cands+=("http://${host}${path}") ;;
esac
if [[ "$host" == www.* ]]; then
cands+=("${scheme}://${host#www.}${path}")
else
cands+=("${scheme}://www.${host}${path}")
fi
if [ -n "$path" ] && [[ "$path" != */ ]] && [[ "${path##*/}" != *.* ]]; then
cands+=("${url%/}/")
fi
for cand in "${cands[@]}"; do
[ "$cand" = "$url" ] && continue
[ "$(http_status "$cand")" = "200" ] && { printf '%s' "$cand"; return 0; }
done
return 1
}
# Hand the broken link to the Copilot CLI agent and let it propose alternatives.
# This is a deliberately lightweight, low-token hand-off: a single non-interactive
# prompt to a small model, with no tools enabled - the agent answers from its own
# knowledge, so there are no web fetches, no permission prompts, and no archive
# lookups on our side. The model may prefix a prose line, so we pull http(s) tokens
# from anywhere in the output, trim trailing punctuation, drop the broken URL
# itself, and de-duplicate (case-insensitively). Up to MAX lines, one URL each.
agent_alts() {
local url="$1" max="$2" prompt out
command -v copilot >/dev/null 2>&1 || return 0
prompt="In under $((AGENT_TIMEOUT - 5)) seconds, find up to ${max} working alternative URLs for the broken link ${url}. Hierarchically consider 1. Path and/or page spelling; 2. web.archive.org/wayback; 3. Redirects using redirect destination; 4. The context of the link's text; in order to resolve. Output only the URLs. One per line, and no: prose, numbering, markdown, backticks, special characters, post formatting."
# FIX_BROKEN_LINKS_AGENT marks the child run so a re-entrant hook exits early.
out="$(FIX_BROKEN_LINKS_AGENT=1 $AGENT_RUN copilot -p "$prompt" \
-s --no-color --model "$AGENT_MODEL" --available-tools 2>/dev/null)"
# If copilot errored, timed out, or produced nothing, offer no alternatives.
[ $? -eq 0 ] && [ -n "$out" ] || return 0
printf '%s\n' "$out" \
| grep -oiE 'https?://[^][:space:]"'\''<>)]+' \
| sed -E 's/[.,;:]+$//' \
| awk -v bad="$url" 'tolower($0) != tolower(bad) && !seen[tolower($0)]++' \
| head -n "$max"
}
# Emit up to MAX viable replacement URLs for a broken link, best first:
# 1. a working scheme/www/slash variation (verified live 200)
# 2. alternatives proposed by the Copilot CLI agent (see agent_alts)
# Output is newline-delimited and de-duplicated (case-insensitively). The first
# line is what `r` uses; the remainder become the numbered alternatives.
suggest_alts() {
local url="$1" max="${2:-6}" cand key
local -A seen=()
local out=()
cand="$(find_variation "$url")" && [ -n "$cand" ] && { out+=("$cand"); seen["${cand,,}"]=1; }
while IFS= read -r cand; do
[ "${#out[@]}" -ge "$max" ] && break
[ -z "$cand" ] && continue
key="${cand,,}"; [ -n "${seen[$key]:-}" ] && continue
out+=("$cand"); seen[$key]=1
done < <(agent_alts "$url" "$max")
[ "${#out[@]}" -eq 0 ] && return 0
printf '%s\n' "${out[@]}"
}
# Replace a literal URL everywhere in a file (pure bash, no regex).
replace_url() {
local file="$1" old="$2" new="$3" content pat
read_file content "$file"
pat="$(glob_escape "$old")"
printf '%s' "${content//$pat/$new}" > "$file"
}
# Remove the link wrapper but keep the visible text:
# <a href="URL">text</a> -> text
# [text](URL) -> text
# Each matched wrapper is swapped for its inner text via literal replacement.
remove_link() {
local file="$1" url="$2" content esc re pat
read_file content "$file"
esc="$(re_escape "$url")"
for re in \
'<a[^>]*href="'"$esc"'"[^>]*>([^<]*)</a>' \
"<a[^>]*href='${esc}'[^>]*>([^<]*)</a>" \
'\[([^]]*)\]\('"$esc"'[^)]*\)'; do
while [[ $content =~ $re ]]; do
# The matched span often contains [ and ] (Markdown), which are glob
# metacharacters, so escape it before the literal substitution.
pat="$(glob_escape "${BASH_REMATCH[0]}")"
content="${content//$pat/${BASH_REMATCH[1]}}"
done
done
printf '%s' "$content" > "$file"
}
# ── File discovery ────────────────────────────────────────────────────────────
collect_input() {
if [ "$#" -gt 0 ]; then printf '%s\n' "$@"; return; fi
# Fired as a hook but the payload carried no (web) files: do nothing rather than
# fall back to scanning unrelated files - the hook only ever checks edited files.
[ -n "$_HOOK" ] && return
local out=""
if command -v git >/dev/null 2>&1 && git rev-parse --git-dir >/dev/null 2>&1; then
out="$({ git diff --name-only HEAD; git diff --name-only --cached; } 2>/dev/null)"
fi
if [ -n "$out" ]; then printf '%s\n' "$out"; return; fi
find . -type d \( -name .git -o -name node_modules -o -name dist -o -name build \
-o -name .next -o -name .venv -o -name __pycache__ \) -prune \
-o -type f -print 2>/dev/null
}
declare -A SEEN
FILES=()
while IFS= read -r f; do
[ -z "$f" ] && continue
[ -f "$f" ] || continue
case "$f" in */node_modules/*|*/.git/*|*/dist/*|*/build/*) continue ;; esac
printf '%s\n' "$f" | grep -qiE "$WEB_RE" || continue
[ -n "${SEEN[$f]:-}" ] && continue
SEEN[$f]=1
FILES+=("$f")
done < <(collect_input "$@")
[ "${#FILES[@]}" -eq 0 ] && exit 0
# ── Scan ──────────────────────────────────────────────────────────────────────
B_FILE=(); B_URL=(); B_STATUS=(); B_ALT=()
SEO_LINES=()
for file in "${FILES[@]}"; do
while IFS= read -r line; do
[ -n "$line" ] && SEO_LINES+=("$file: $line")
done < <(seo_scan "$file")
mapfile -t urls < <(extract_urls "$file")
[ "${#urls[@]}" -eq 0 ] && continue
if [ "$HAVE_PARAMS" = "1" ] && [ "${#urls[@]}" -gt "$LIMIT" ]; then
ans="$(ask " ${file} has ${#urls[@]} links (limit ${LIMIT}). Continue? [Y/n] ")"
case "$ans" in n|N|no|NO) continue ;; esac
fi
printf '\n Checking %d link(s) in %s ...\n' "${#urls[@]}" "$file"
for url in "${urls[@]}"; do
status="$(http_status "$url")"
[ "$status" = "200" ] && continue
printf ' BROKEN (%s) %s\n' "$status" "$url"
# Only look up replacements when files were passed; otherwise just list.
alts=""
[ "$HAVE_PARAMS" = "1" ] && alts="$(suggest_alts "$url" 6)"
B_FILE+=("$file"); B_URL+=("$url"); B_STATUS+=("$status"); B_ALT+=("$alts")
done
done
# ── SEO report ────────────────────────────────────────────────────────────────
if [ "${#SEO_LINES[@]}" -gt 0 ]; then
printf '\n%s\n SEO anchor issues (consider descriptive link text)\n' "------------------------------------------------------------"
for s in "${SEO_LINES[@]}"; do printf ' %s\n' "$s"; done
fi
if [ "${#B_URL[@]}" -eq 0 ]; then
printf '\n No broken links found.\n\n'
exit 0
fi
# ── Interactive fix ───────────────────────────────────────────────────────────
printf '\n%s\n fix-broken-links report\n%s\n' "============================================================" "============================================================"
declare -A CHANGED
n="${#B_URL[@]}"
for ((i=0; i<n; i++)); do
file="${B_FILE[$i]}"; url="${B_URL[$i]}"; status="${B_STATUS[$i]}"
printf '\n [%d] %s\n' "$((i+1))" "$file"
printf ' URL : %s\n' "$url"
note=""; case "$status" in ERR|000|TIMEOUT) note=" (unreachable)" ;; esac
printf ' HTTP: %s%s\n' "$status" "$note"
# No file parameters → report-only: list the broken link and move on.
[ "$HAVE_PARAMS" = "1" ] || continue
alts=(); [ -n "${B_ALT[$i]}" ] && mapfile -t alts <<< "${B_ALT[$i]}"
printf '\n'
if [ "${#alts[@]}" -gt 0 ]; then
printf ' r Replace -> %s\n' "${alts[0]}"
for ((k=1; k<${#alts[@]}; k++)); do
printf ' %d Replace -> %s\n' "$k" "${alts[$k]}"
done
fi
printf ' d Remove link, keep text\n'
printf ' c Custom replacement URL\n'
printf ' s Skip\n'
if [ -z "$TTY" ]; then
printf ' (no terminal - reporting only)\n'
continue
fi
while true; do
ch="$(ask ' > ')"
case "$ch" in
s|"") break ;;
d) remove_link "$file" "$url"; CHANGED[$file]=1; printf ' removed\n'; break ;;
r) if [ "${#alts[@]}" -gt 0 ]; then
replace_url "$file" "$url" "${alts[0]}"; CHANGED[$file]=1; printf ' replaced -> %s\n' "${alts[0]}"; break
fi
printf ' no suggestion available\n' ;;
[1-9]) if [ "$ch" -lt "${#alts[@]}" ]; then
replace_url "$file" "$url" "${alts[$ch]}"; CHANGED[$file]=1; printf ' replaced -> %s\n' "${alts[$ch]}"; break
else printf ' invalid choice\n'; fi ;;
c) u="$(ask ' URL: ')"
if [ -n "$u" ]; then replace_url "$file" "$url" "$u"; CHANGED[$file]=1; printf ' replaced\n'; break; fi ;;
*) printf ' invalid choice\n' ;;
esac
done
done
if [ "${CHANGED[*]+x}" = x ] && [ "${#CHANGED[@]}" -gt 0 ]; then
printf '\n %d file(s) updated:\n' "${#CHANGED[@]}"
for f in "${!CHANGED[@]}"; do printf ' %s\n' "$f"; done
printf '\n'
fi
exit 0