mirror of
https://github.com/github/awesome-copilot.git
synced 2026-06-18 21:51:27 +00:00
new hook fix-broken-links (#2027)
* new hook fix-broken-links * codespell: add ans for variable short for answer * update: scripts hand off alternative url to copilot cmd * codespell: add ext. arcade-canvas/game/phaser.min.js * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * update: rm em-dash from hook scripts --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
+3
-1
@@ -54,7 +54,9 @@
|
||||
|
||||
# CAF - Microsoft Cloud Adoption Framework acronym
|
||||
|
||||
ignore-words-list = numer,wit,aks,edn,ser,ois,gir,rouge,categor,aline,ative,afterall,deques,dateA,dateB,TE,FillIn,alle,vai,LOD,InOut,pixelX,aNULL,Wee,Sherif,queston,Vertexes,nin,FO,CAF,Parth
|
||||
# ans - bash and powershell variable short for answer
|
||||
|
||||
ignore-words-list = numer,wit,aks,edn,ser,ois,gir,rouge,categor,aline,ative,afterall,deques,dateA,dateB,TE,FillIn,alle,vai,LOD,InOut,pixelX,aNULL,Wee,Sherif,queston,Vertexes,nin,FO,CAF,Parth,ans
|
||||
|
||||
# Skip certain files and directories
|
||||
|
||||
|
||||
@@ -32,6 +32,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-hooks) for guidelines on how to
|
||||
| Name | Description | Events | Bundled Assets |
|
||||
| ---- | ----------- | ------ | -------------- |
|
||||
| [Dependency License Checker](../hooks/dependency-license-checker/README.md) | Scans newly added dependencies for license compliance (GPL, AGPL, etc.) at session end | sessionEnd | `check-licenses.sh`<br />`hooks.json` |
|
||||
| [Fix Broken Links](../hooks/fix-broken-links/README.md) | Checks changed web files for broken hyperlinks and SEO anchor issues after each Copilot tool use. | postToolUse | `hooks.json`<br />`link-fix.ps1`<br />`link-fix.sh` |
|
||||
| [Governance Audit](../hooks/governance-audit/README.md) | Scans Copilot agent prompts for threat signals and logs governance events | sessionStart, sessionEnd, userPromptSubmitted | `audit-prompt.sh`<br />`audit-session-end.sh`<br />`audit-session-start.sh`<br />`hooks.json` |
|
||||
| [Secrets Scanner](../hooks/secrets-scanner/README.md) | Scans files modified during a Copilot coding agent session for leaked secrets, credentials, and sensitive data | sessionEnd | `hooks.json`<br />`scan-secrets.sh` |
|
||||
| [Session Auto-Commit](../hooks/session-auto-commit/README.md) | Automatically commits and pushes changes when a Copilot coding agent session ends | sessionEnd | `auto-commit.sh`<br />`hooks.json` |
|
||||
|
||||
@@ -0,0 +1,177 @@
|
||||
---
|
||||
name: 'Fix Broken Links'
|
||||
description: 'Checks changed web files for broken hyperlinks and SEO anchor issues after each Copilot tool use.'
|
||||
tags: ['links', 'seo', 'html', 'markdown', 'post-tool-use']
|
||||
---
|
||||
|
||||
# Fix Broken Links Hook
|
||||
|
||||
Scans recently-changed web files for broken hyperlinks after each GitHub Copilot
|
||||
tool use. For each broken URL the hook tries common spelling variations, then hands
|
||||
the link to the Copilot CLI agent for suggested replacements, and presents an
|
||||
interactive fix menu. Generic anchor text (`click here`, `read more`, etc.) is
|
||||
flagged as an SEO issue.
|
||||
|
||||
## Overview
|
||||
|
||||
Broken links accumulate silently in web projects. Running on the `postToolUse`
|
||||
event, this hook checks the web files the agent just edited — and only those —
|
||||
right after each change, so you can fix, replace, or remove each broken link in
|
||||
the same terminal session.
|
||||
|
||||
The hook has two modes:
|
||||
|
||||
- **With file paths** (the edited files injected from the hook payload, or paths
|
||||
passed on the command line): it checks each link, looks up replacement
|
||||
candidates, and presents the interactive fix menu.
|
||||
- **With no file arguments**: it simply lists the broken links it finds — no
|
||||
replacement lookups and no prompts.
|
||||
|
||||
## Features
|
||||
|
||||
- **Self-contained core**: bash and PowerShell ports — no runtime to install (the optional agent
|
||||
hand-off reuses the Copilot CLI you already have)
|
||||
- **Edited-files scope**: as a `postToolUse` hook it only checks the files the agent just changed —
|
||||
never a full repo scan
|
||||
- **Format-agnostic link scan**: extracts every `http(s)` URL with `grep`, covering HTML, Markdown,
|
||||
JS/TS, JSON, CSS, SQL, and templates at once
|
||||
- **Automatic URL healing**: tries www, https, and trailing-slash variations
|
||||
- **Agent-assisted suggestions**: hands the broken link to the Copilot CLI agent (a lightweight,
|
||||
low-token `gpt-5-mini` prompt with no tools) for replacement candidates; if the CLI is missing or
|
||||
errors, it simply offers none
|
||||
- **SEO audit**: flags anchor text that is too generic to benefit search ranking
|
||||
- **Large-file guard**: prompts before checking files with more than 50 links
|
||||
- **Interactive fix menu**: replace with suggestion, enter custom URL, strip tag keeping text, or
|
||||
skip
|
||||
- **Standard tools only**: `curl`, `grep`, `sed` — present on any POSIX system
|
||||
|
||||
## Installation
|
||||
|
||||
1. Copy the hook folder to your repository:
|
||||
|
||||
```bash
|
||||
cp -r hooks/fix-broken-links .github/hooks/
|
||||
```
|
||||
|
||||
2. Make the script executable:
|
||||
|
||||
```bash
|
||||
chmod +x .github/hooks/fix-broken-links/link-fix.sh
|
||||
```
|
||||
|
||||
3. Commit the hook configuration to your repository's default branch.
|
||||
|
||||
## Configuration
|
||||
|
||||
The hook is configured in `hooks.json` to run on the `postToolUse` event:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 1,
|
||||
"hooks": {
|
||||
"postToolUse": [
|
||||
{
|
||||
"type": "command",
|
||||
"bash": ".github/hooks/fix-broken-links/link-fix.sh",
|
||||
"powershell": ".github/hooks/fix-broken-links/link-fix.ps1",
|
||||
"cwd": ".",
|
||||
"timeoutSec": 120
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Supported Source Types
|
||||
|
||||
Links are found by scanning each file for `http(s)://` URLs, so the same logic
|
||||
covers every format that embeds absolute URLs:
|
||||
|
||||
| Source | Examples matched |
|
||||
| --- | --- |
|
||||
| HTML | `<a href>`, `<img src>`, `<script src>`, `<link href>`, `<iframe src>` |
|
||||
| Markdown | `[text](url)`, `[text][ref]`, bare `<url>` |
|
||||
| JS / TS / Vue / Svelte | `fetch()`, `XMLHttpRequest.open()`, jQuery, axios, `href:`/`url:` props |
|
||||
| JSON / JSONL | any string value that is an absolute URL |
|
||||
| CSS | `url(...)` |
|
||||
| SQL | URL literals in query strings |
|
||||
| Templates | Jinja2, ERB, EJS, Handlebars, Pug |
|
||||
|
||||
The `d` (remove) action understands HTML `<a>` wrappers and Markdown `[text](url)`
|
||||
links specifically, keeping the visible text. Other source types support
|
||||
`r` (replace) and `c` (custom) via literal URL substitution.
|
||||
|
||||
## Fix Options
|
||||
|
||||
For each broken link:
|
||||
|
||||
| Key | Action |
|
||||
| --- | --- |
|
||||
| `r` | Replace with the suggested URL (a working variation, or an agent-proposed alternative) |
|
||||
| `d` | Strip the link wrapper, keeping the visible text as plain text |
|
||||
| `c` | Enter a custom replacement URL |
|
||||
| `s` | Skip |
|
||||
|
||||
## Example Output
|
||||
|
||||
```text
|
||||
Checking 2 link(s) in docs/guide.md ...
|
||||
BROKEN (404) https://example.com/old-page
|
||||
|
||||
------------------------------------------------------------
|
||||
SEO anchor issues (consider descriptive link text)
|
||||
docs/guide.md: <a href="https://example.com/old-page">click here</a>
|
||||
|
||||
============================================================
|
||||
fix-broken-links report
|
||||
============================================================
|
||||
|
||||
[1] docs/guide.md
|
||||
URL : https://example.com/old-page
|
||||
HTTP: 404
|
||||
|
||||
r Replace -> https://example.com/docs/install
|
||||
1 Replace -> https://example.com/docs/getting-started
|
||||
d Remove link, keep text
|
||||
c Custom replacement URL
|
||||
s Skip
|
||||
> r
|
||||
replaced
|
||||
|
||||
1 file(s) updated:
|
||||
docs/guide.md
|
||||
```
|
||||
|
||||
With no file arguments (or when the edited file carries no checkable links) the
|
||||
hook stops after the broken-link list — the menu above is skipped.
|
||||
|
||||
## Requirements
|
||||
|
||||
- `curl` — HTTP status checks (the hook exits quietly if absent)
|
||||
- `grep`, `sed` — link extraction (standard on any POSIX system)
|
||||
- `jq` — required by the bash hook to parse the postToolUse JSON payload and discover edited files
|
||||
- Bash 4+ (for `link-fix.sh`); on Windows use Git Bash or WSL, or run the PowerShell 7+ port
|
||||
`link-fix.ps1`
|
||||
- `copilot` (GitHub Copilot CLI) — optional; powers the agent-suggested replacements. Without it,
|
||||
only verified spelling variations are offered
|
||||
- `git` is used for changed-file discovery; the hook falls back to a full repo scan without it
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
.github/hooks/fix-broken-links/
|
||||
├── hooks.json GitHub Copilot hook configuration
|
||||
├── link-fix.sh Bash hook implementation
|
||||
├── link-fix.ps1 PowerShell 7+ port
|
||||
└── README.md This file
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Only checks absolute `http://` and `https://` URLs; relative paths require a running server
|
||||
- Dynamic links generated at runtime from database queries are not detectable from source alone
|
||||
- When `copilot` suggestions are enabled, broken URLs are sent to the Copilot service as prompt input
|
||||
- Agent-suggested replacements are model proposals and are not verified live; confirm each before
|
||||
accepting
|
||||
- The `d` (remove) action targets HTML and Markdown link syntax; bare URLs in code are best handled
|
||||
with `r` or `c`
|
||||
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"version": 1,
|
||||
"hooks": {
|
||||
"postToolUse": [
|
||||
{
|
||||
"type": "command",
|
||||
"bash": ".github/hooks/fix-broken-links/link-fix.sh",
|
||||
"powershell": ".github/hooks/fix-broken-links/link-fix.ps1",
|
||||
"cwd": ".",
|
||||
"timeoutSec": 120
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,410 @@
|
||||
#!/usr/bin/env pwsh
|
||||
# fix-broken-links - link-fix.ps1 (PowerShell 7+ port of link-fix.sh)
|
||||
#
|
||||
# After the agent edits files (postToolUse): take the files it just changed,
|
||||
# extract every http(s) URL, and check each one.
|
||||
# • With file paths passed (the edited files, injected from the hook payload, or
|
||||
# given on the command line) any URL that is not 200 gets spelling variations
|
||||
# (http/https, www, trailing slash) then a Copilot CLI agent hand-off for more
|
||||
# alternatives, followed by an interactive menu to replace / remove / skip.
|
||||
# • With NO file arguments it only lists the broken links - no alternative
|
||||
# lookups and no prompts.
|
||||
# Generic anchor text is flagged as an SEO note either way.
|
||||
#
|
||||
# Pure PowerShell + .NET (Invoke-WebRequest/regex), plus an optional Copilot CLI
|
||||
# hand-off for suggestions.
|
||||
# Covers: HTML · Markdown · JS/TS · JSON · CSS · SQL · templates (all via URL scan)
|
||||
# Trigger: postToolUse
|
||||
|
||||
Set-StrictMode -Off
|
||||
$ProgressPreference = 'SilentlyContinue' # Invoke-WebRequest is far faster without the bar
|
||||
|
||||
# The agent hand-off below invokes `copilot`, which may itself re-fire this hook.
|
||||
# The child run is marked with this env var; exit immediately if it is present so
|
||||
# we never recurse.
|
||||
if ($env:FIX_BROKEN_LINKS_AGENT) { exit 0 }
|
||||
|
||||
$LIMIT = 50
|
||||
$TIMEOUT = 10
|
||||
$UA = 'Mozilla/5.0 (compatible; fix-broken-links/1.0)'
|
||||
$AGENT_MODEL = 'gpt-5-mini' # small, low-token model for the suggestion hand-off
|
||||
$AGENT_TIMEOUT = 60 # seconds before giving up on the agent
|
||||
$WEB_RE = '\.(html?|xhtml|md|markdown|mdx|js|jsx|ts|tsx|vue|svelte|json|jsonl|css|sql|erb|jinja|j2|twig|ejs|pug|hbs)$'
|
||||
|
||||
# Positional args become the file list; the hook payload can also supply them.
|
||||
$ScriptArgs = [System.Collections.Generic.List[string]]::new()
|
||||
foreach ($a in $args) { [void]$ScriptArgs.Add([string]$a) }
|
||||
|
||||
# ── Hook stdin ────────────────────────────────────────────────────────────────
|
||||
# When called as a postToolUse hook, extract edited files from the JSON payload
|
||||
# and inject them as positional args so Get-InputFiles picks them up.
|
||||
$IsHook = $false
|
||||
if ($ScriptArgs.Count -eq 0 -and [Console]::IsInputRedirected) {
|
||||
$IsHook = $true # invoked as a hook: stdin carries the tool payload
|
||||
$raw = [Console]::In.ReadToEnd()
|
||||
if ($raw.Trim()) {
|
||||
try {
|
||||
$json = $raw | ConvertFrom-Json
|
||||
$tool = $json.toolName; if (-not $tool) { $tool = $json.tool_name }
|
||||
if ($tool) {
|
||||
if ($tool -in 'editFiles','edit','write','str_replace_editor','create_file','multiEdit','applyPatch') {
|
||||
# Only the files this edit tool just changed - never a wider repo scan.
|
||||
$hookFiles = $json.tool_input.files; if (-not $hookFiles) { $hookFiles = $json.toolInput.files }
|
||||
if (-not $hookFiles) { $hookFiles = $json.tool_input.path; if (-not $hookFiles) { $hookFiles = $json.toolInput.path } }
|
||||
if ($hookFiles) { foreach ($hf in $hookFiles) { [void]$ScriptArgs.Add([string]$hf) } }
|
||||
}
|
||||
else {
|
||||
# Different tool (bash, read, etc.) - nothing to check
|
||||
exit 0
|
||||
}
|
||||
}
|
||||
# No tool context - called manually with piped input, fall through
|
||||
} catch { }
|
||||
}
|
||||
}
|
||||
|
||||
# A non-empty positional list means the caller passed files: the edited files from
|
||||
# the hook payload above, or paths given on the command line. Only then do we run
|
||||
# the full repair flow (look up alternatives, then prompt to fix). With no
|
||||
# parameters we simply list the broken links - no lookups, no prompts.
|
||||
$HaveParams = $ScriptArgs.Count -gt 0
|
||||
|
||||
# Interactive prompts are only possible when input is a real console; once the
|
||||
# hook JSON has been read from a redirected stdin we report rather than prompt.
|
||||
$Interactive = [Environment]::UserInteractive -and -not [Console]::IsInputRedirected
|
||||
|
||||
function Read-Answer {
|
||||
param([string]$Prompt)
|
||||
if (-not $Interactive) { return '' }
|
||||
[Console]::Out.Write($Prompt)
|
||||
$ans = [Console]::In.ReadLine()
|
||||
if ($null -eq $ans) { return '' }
|
||||
return $ans
|
||||
}
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
function Get-HttpStatus {
|
||||
param([string]$Url)
|
||||
try {
|
||||
$resp = Invoke-WebRequest -Uri $Url -MaximumRedirection 5 -TimeoutSec $TIMEOUT `
|
||||
-UserAgent $UA -ErrorAction Stop
|
||||
return [string][int]$resp.StatusCode
|
||||
} catch {
|
||||
$resp = $_.Exception.Response
|
||||
if ($resp -and $resp.StatusCode) { return [string][int]$resp.StatusCode }
|
||||
return 'ERR'
|
||||
}
|
||||
}
|
||||
|
||||
# Split a URL into scheme/host/path the same way the bash port does (string ops,
|
||||
# not [uri], so wildcards and odd paths survive intact).
|
||||
function Split-Url {
|
||||
param([string]$Url)
|
||||
$scheme = ($Url -split '://',2)[0]
|
||||
$rest = $Url -replace '^[a-zA-Z][a-zA-Z0-9+.-]*://',''
|
||||
$hostName = ($rest -split '/',2)[0]
|
||||
if ($rest -eq $hostName) { $path = '' } else { $path = '/' + ($rest -split '/',2)[1] }
|
||||
[pscustomobject]@{ Scheme = $scheme; Host = $hostName; Path = $path }
|
||||
}
|
||||
|
||||
# Every http(s) URL in a file, trailing punctuation trimmed, de-duplicated.
|
||||
function Get-Urls {
|
||||
param([string]$File)
|
||||
$text = [System.IO.File]::ReadAllText($File)
|
||||
[regex]::Matches($text, 'https?://[^"''<> )]+', 'IgnoreCase') |
|
||||
ForEach-Object { $_.Value -replace '[.,;:]+$','' } |
|
||||
Sort-Object -Unique
|
||||
}
|
||||
|
||||
# Generic anchor text that weakens SEO.
|
||||
function Get-SeoIssues {
|
||||
param([string]$File)
|
||||
$text = [System.IO.File]::ReadAllText($File)
|
||||
$reA = '<a[^>]*>\s*(click here|click|here|read more|more|this page|this|learn more|see more|view|visit|details|info)\s*</a>'
|
||||
$reB = '\[(click here|click|here|read more|more|this page|learn more|see more|details|info)\]\('
|
||||
@([regex]::Matches($text, $reA, 'IgnoreCase')) +
|
||||
@([regex]::Matches($text, $reB, 'IgnoreCase')) | ForEach-Object { $_.Value }
|
||||
}
|
||||
|
||||
# Try common URL variations; return the first that returns 200, else ''.
|
||||
function Find-Variation {
|
||||
param([string]$Url)
|
||||
$p = Split-Url $Url
|
||||
$scheme = $p.Scheme; $hostName = $p.Host; $path = $p.Path
|
||||
$cands = [System.Collections.Generic.List[string]]::new()
|
||||
if ($scheme -eq 'http') { [void]$cands.Add("https://$hostName$path") }
|
||||
if ($scheme -eq 'https') { [void]$cands.Add("http://$hostName$path") }
|
||||
if ($hostName -like 'www.*') { [void]$cands.Add("$scheme`://$($hostName.Substring(4))$path") }
|
||||
else { [void]$cands.Add("$scheme`://www.$hostName$path") }
|
||||
if ($path -and $path -notmatch '/$' -and (($path -split '/')[-1]) -notmatch '\.') {
|
||||
[void]$cands.Add(($Url -replace '/$','') + '/')
|
||||
}
|
||||
foreach ($c in $cands) {
|
||||
if ($c -eq $Url) { continue }
|
||||
if ((Get-HttpStatus $c) -eq '200') { return $c }
|
||||
}
|
||||
return ''
|
||||
}
|
||||
|
||||
# Hand the broken link to the Copilot CLI agent and let it propose alternatives.
|
||||
# A deliberately lightweight, low-token hand-off: one non-interactive prompt to a
|
||||
# small model with no tools enabled (so it answers from its own knowledge - no web
|
||||
# fetches, no permission prompts, no archive lookups on our side). The model may
|
||||
# prefix a prose line, so we pull http(s) tokens from anywhere in the output, trim
|
||||
# trailing punctuation, drop the broken URL itself, and de-duplicate. The call runs
|
||||
# as a job so it can be capped at $AGENT_TIMEOUT seconds.
|
||||
function Get-AgentAlts {
|
||||
param([string]$Url,[int]$Max)
|
||||
if (-not (Get-Command copilot -ErrorAction SilentlyContinue)) { return @() }
|
||||
$snappy = $AGENT_TIMEOUT - 5
|
||||
$prompt = "In under $snappy seconds, find up to $Max working alternative URLs for the broken link $Url. Hierarchically consider 1. Path and/or page spelling; 2. web.archive.org/wayback; 3. Redirects using redirect destination; 4. The context of the link's text; in order to resolve. Output only the URLs. One per line, and no: prose, numbering, markdown, backticks, special characters, post formatting."
|
||||
$out = ''
|
||||
try {
|
||||
# FIX_BROKEN_LINKS_AGENT marks the child run so a re-entrant hook exits early.
|
||||
$job = Start-Job -ScriptBlock {
|
||||
param($Prompt, $Model)
|
||||
$env:FIX_BROKEN_LINKS_AGENT = '1'
|
||||
copilot -p $Prompt -s --no-color --model $Model --available-tools 2>$null
|
||||
} -ArgumentList $prompt, $AGENT_MODEL
|
||||
# Only read output from a job that completed cleanly; a failed/errored copilot
|
||||
# run yields no alternatives.
|
||||
if ((Wait-Job $job -Timeout $AGENT_TIMEOUT) -and $job.State -eq 'Completed') {
|
||||
$out = (Receive-Job $job -ErrorAction SilentlyContinue | Out-String)
|
||||
}
|
||||
Remove-Job $job -Force -ErrorAction SilentlyContinue
|
||||
} catch { $out = '' }
|
||||
if (-not $out) { return @() }
|
||||
|
||||
$seen = @{}
|
||||
$result = [System.Collections.Generic.List[string]]::new()
|
||||
foreach ($m in [regex]::Matches($out, 'https?://[^\s"''<>)\]]+', 'IgnoreCase')) {
|
||||
if ($result.Count -ge $Max) { break }
|
||||
$u = $m.Value -replace '[.,;:]+$',''
|
||||
$key = $u.ToLower()
|
||||
if ($key -eq $Url.ToLower()) { continue }
|
||||
if ($seen.ContainsKey($key)) { continue }
|
||||
$seen[$key] = $true
|
||||
[void]$result.Add($u)
|
||||
}
|
||||
return ,$result.ToArray()
|
||||
}
|
||||
|
||||
# Up to MAX viable replacement URLs for a broken link, best first:
|
||||
# 1. a working scheme/www/slash variation (verified live 200)
|
||||
# 2. alternatives proposed by the Copilot CLI agent (see Get-AgentAlts)
|
||||
# De-duplicated case-insensitively. The first item is what `r` uses; the rest
|
||||
# become the numbered alternatives.
|
||||
function Get-SuggestedAlts {
|
||||
param([string]$Url,[int]$Max = 6)
|
||||
$seen = @{}
|
||||
$out = [System.Collections.Generic.List[string]]::new()
|
||||
|
||||
$v = Find-Variation $Url
|
||||
if ($v) { [void]$out.Add($v); $seen[$v.ToLower()] = $true }
|
||||
|
||||
foreach ($a in (Get-AgentAlts $Url $Max)) {
|
||||
if ($out.Count -ge $Max) { break }
|
||||
if (-not $a) { continue }
|
||||
$key = $a.ToLower()
|
||||
if ($seen.ContainsKey($key)) { continue }
|
||||
[void]$out.Add($a); $seen[$key] = $true
|
||||
}
|
||||
return ,$out.ToArray()
|
||||
}
|
||||
|
||||
# Replace a literal URL everywhere in a file (plain string replace, no regex).
|
||||
function Set-UrlReplacement {
|
||||
param([string]$File,[string]$Old,[string]$New)
|
||||
$content = [System.IO.File]::ReadAllText($File)
|
||||
[System.IO.File]::WriteAllText($File, $content.Replace($Old, $New))
|
||||
}
|
||||
|
||||
# Remove the link wrapper but keep the visible text:
|
||||
# <a href="URL">text</a> -> text
|
||||
# [text](URL) -> text
|
||||
function Remove-LinkWrapper {
|
||||
param([string]$File,[string]$Url)
|
||||
$content = [System.IO.File]::ReadAllText($File)
|
||||
$esc = [regex]::Escape($Url)
|
||||
# Each element is parenthesized: the comma operator binds tighter than '+', so
|
||||
# without the parens the three concatenations collapse into a single string and
|
||||
# the array would hold one bogus pattern instead of three.
|
||||
$patterns = @(
|
||||
('<a[^>]*href="' + $esc + '"[^>]*>([^<]*)</a>'),
|
||||
("<a[^>]*href='" + $esc + "'[^>]*>([^<]*)</a>"),
|
||||
('\[([^\]]*)\]\(' + $esc + '[^)]*\)')
|
||||
)
|
||||
foreach ($pat in $patterns) {
|
||||
$content = [regex]::Replace($content, $pat, '$1', 'IgnoreCase')
|
||||
}
|
||||
[System.IO.File]::WriteAllText($File, $content)
|
||||
}
|
||||
|
||||
# ── File discovery ────────────────────────────────────────────────────────────
|
||||
|
||||
function Get-InputFiles {
|
||||
if ($ScriptArgs.Count -gt 0) { return $ScriptArgs.ToArray() }
|
||||
# Fired as a hook but the payload carried no (web) files: do nothing rather than
|
||||
# fall back to scanning unrelated files - the hook only ever checks edited files.
|
||||
if ($IsHook) { return @() }
|
||||
$out = @()
|
||||
if (Get-Command git -ErrorAction SilentlyContinue) {
|
||||
git rev-parse --git-dir *> $null
|
||||
if ($LASTEXITCODE -eq 0) {
|
||||
$out = @(git diff --name-only HEAD 2>$null) + @(git diff --name-only --cached 2>$null)
|
||||
}
|
||||
}
|
||||
if ($out.Count -gt 0) { return $out }
|
||||
Get-ChildItem -Recurse -File -ErrorAction SilentlyContinue |
|
||||
Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|dist|build|\.next|\.venv|__pycache__)[\\/]' } |
|
||||
ForEach-Object { Resolve-Path -Relative -LiteralPath $_.FullName }
|
||||
}
|
||||
|
||||
$seenFiles = @{}
|
||||
$FILES = [System.Collections.Generic.List[string]]::new()
|
||||
foreach ($f in (Get-InputFiles)) {
|
||||
if (-not $f) { continue }
|
||||
$f = ([string]$f).Trim()
|
||||
if (-not (Test-Path -LiteralPath $f -PathType Leaf)) { continue }
|
||||
if ($f -match '[\\/](node_modules|\.git|dist|build)[\\/]') { continue }
|
||||
if ($f -notmatch $WEB_RE) { continue }
|
||||
if ($seenFiles.ContainsKey($f)) { continue }
|
||||
$seenFiles[$f] = $true
|
||||
[void]$FILES.Add($f)
|
||||
}
|
||||
|
||||
if ($FILES.Count -eq 0) { exit 0 }
|
||||
|
||||
# ── Scan ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
$B_FILE = [System.Collections.Generic.List[string]]::new()
|
||||
$B_URL = [System.Collections.Generic.List[string]]::new()
|
||||
$B_STATUS = [System.Collections.Generic.List[string]]::new()
|
||||
$B_ALT = [System.Collections.Generic.List[object]]::new()
|
||||
$SEO_LINES = [System.Collections.Generic.List[string]]::new()
|
||||
|
||||
foreach ($file in $FILES) {
|
||||
foreach ($line in (Get-SeoIssues $file)) {
|
||||
if ($line) { [void]$SEO_LINES.Add("${file}: $line") }
|
||||
}
|
||||
|
||||
$urls = @(Get-Urls $file)
|
||||
if ($urls.Count -eq 0) { continue }
|
||||
|
||||
if ($HaveParams -and $urls.Count -gt $LIMIT) {
|
||||
$ans = Read-Answer " $file has $($urls.Count) links (limit $LIMIT). Continue? [Y/n] "
|
||||
if ($ans -in 'n','N','no','NO') { continue }
|
||||
}
|
||||
|
||||
Write-Host ""
|
||||
Write-Host " Checking $($urls.Count) link(s) in $file ..."
|
||||
foreach ($url in $urls) {
|
||||
$status = Get-HttpStatus $url
|
||||
if ($status -eq '200') { continue }
|
||||
Write-Host " BROKEN ($status) $url"
|
||||
# Only look up replacements when files were passed; otherwise just list.
|
||||
$alts = @()
|
||||
if ($HaveParams) { $alts = Get-SuggestedAlts $url 6 }
|
||||
[void]$B_FILE.Add($file)
|
||||
[void]$B_URL.Add($url)
|
||||
[void]$B_STATUS.Add($status)
|
||||
[void]$B_ALT.Add($alts)
|
||||
}
|
||||
}
|
||||
|
||||
# ── SEO report ────────────────────────────────────────────────────────────────
|
||||
|
||||
if ($SEO_LINES.Count -gt 0) {
|
||||
Write-Host ""
|
||||
Write-Host "------------------------------------------------------------"
|
||||
Write-Host " SEO anchor issues (consider descriptive link text)"
|
||||
foreach ($s in $SEO_LINES) { Write-Host " $s" }
|
||||
}
|
||||
|
||||
if ($B_URL.Count -eq 0) {
|
||||
Write-Host ""
|
||||
Write-Host " No broken links found."
|
||||
Write-Host ""
|
||||
exit 0
|
||||
}
|
||||
|
||||
# ── Interactive fix ───────────────────────────────────────────────────────────
|
||||
|
||||
Write-Host ""
|
||||
Write-Host "============================================================"
|
||||
Write-Host " fix-broken-links report"
|
||||
Write-Host "============================================================"
|
||||
|
||||
$CHANGED = @{}
|
||||
$n = $B_URL.Count
|
||||
for ($i = 0; $i -lt $n; $i++) {
|
||||
$file = $B_FILE[$i]
|
||||
$url = $B_URL[$i]
|
||||
$status = $B_STATUS[$i]
|
||||
$alts = @($B_ALT[$i])
|
||||
|
||||
Write-Host ""
|
||||
Write-Host " [$($i + 1)] $file"
|
||||
Write-Host " URL : $url"
|
||||
$note = ''
|
||||
if ($status -in 'ERR','000','TIMEOUT') { $note = ' (unreachable)' }
|
||||
Write-Host " HTTP: $status$note"
|
||||
|
||||
# No file parameters → report-only: list the broken link and move on.
|
||||
if (-not $HaveParams) { continue }
|
||||
|
||||
Write-Host ""
|
||||
if ($alts.Count -gt 0) {
|
||||
Write-Host " r Replace -> $($alts[0])"
|
||||
for ($k = 1; $k -lt $alts.Count; $k++) {
|
||||
Write-Host " $k Replace -> $($alts[$k])"
|
||||
}
|
||||
}
|
||||
Write-Host " d Remove link, keep text"
|
||||
Write-Host " c Custom replacement URL"
|
||||
Write-Host " s Skip"
|
||||
|
||||
if (-not $Interactive) {
|
||||
Write-Host " (no terminal - reporting only)"
|
||||
continue
|
||||
}
|
||||
|
||||
while ($true) {
|
||||
$ch = Read-Answer ' > '
|
||||
if ($ch -eq 's' -or $ch -eq '') { break }
|
||||
elseif ($ch -eq 'd') {
|
||||
Remove-LinkWrapper $file $url; $CHANGED[$file] = $true; Write-Host " removed"; break
|
||||
}
|
||||
elseif ($ch -eq 'r') {
|
||||
if ($alts.Count -gt 0) {
|
||||
Set-UrlReplacement $file $url $alts[0]; $CHANGED[$file] = $true
|
||||
Write-Host " replaced -> $($alts[0])"; break
|
||||
}
|
||||
Write-Host " no suggestion available"
|
||||
}
|
||||
elseif ($ch -match '^[1-9]$') {
|
||||
$idx = [int]$ch
|
||||
if ($idx -lt $alts.Count) {
|
||||
Set-UrlReplacement $file $url $alts[$idx]; $CHANGED[$file] = $true
|
||||
Write-Host " replaced -> $($alts[$idx])"; break
|
||||
}
|
||||
Write-Host " invalid choice"
|
||||
}
|
||||
elseif ($ch -eq 'c') {
|
||||
$u = Read-Answer ' URL: '
|
||||
if ($u) { Set-UrlReplacement $file $url $u; $CHANGED[$file] = $true; Write-Host " replaced"; break }
|
||||
}
|
||||
else {
|
||||
Write-Host " invalid choice"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ($CHANGED.Count -gt 0) {
|
||||
Write-Host ""
|
||||
Write-Host " $($CHANGED.Count) file(s) updated:"
|
||||
foreach ($f in $CHANGED.Keys) { Write-Host " $f" }
|
||||
Write-Host ""
|
||||
}
|
||||
exit 0
|
||||
@@ -0,0 +1,372 @@
|
||||
#!/bin/bash
|
||||
# fix-broken-links - link-fix.sh
|
||||
#
|
||||
# After the agent edits files (postToolUse): take the files it just changed,
|
||||
# extract every http(s) URL, and check each with curl.
|
||||
# • With file paths passed (the edited files, injected from the hook payload, or
|
||||
# given on the command line) any URL that is not 200 gets spelling variations
|
||||
# (http/https, www, trailing slash) then a Copilot CLI agent hand-off for more
|
||||
# alternatives, followed by an interactive menu to replace / remove / skip.
|
||||
# • With NO file arguments it only lists the broken links - no alternative
|
||||
# lookups and no prompts.
|
||||
# Generic anchor text is flagged as an SEO note either way.
|
||||
#
|
||||
# Pure bash + grep/sed/curl, plus an optional Copilot CLI hand-off for suggestions.
|
||||
# Covers: HTML · Markdown · JS/TS · JSON · CSS · SQL · templates (all via URL scan)
|
||||
# Requires: curl, grep, sed | Optional: copilot | Trigger: postToolUse
|
||||
set -uo pipefail
|
||||
|
||||
# The agent hand-off below invokes `copilot`, which may itself re-fire this hook.
|
||||
# The child run is marked with this env var; exit immediately if it is present so
|
||||
# we never recurse.
|
||||
[ -n "${FIX_BROKEN_LINKS_AGENT:-}" ] && exit 0
|
||||
|
||||
LIMIT=50
|
||||
TIMEOUT=10
|
||||
UA='Mozilla/5.0 (compatible; fix-broken-links/1.0)'
|
||||
AGENT_MODEL='gpt-5-mini' # small, low-token model for the suggestion hand-off
|
||||
AGENT_TIMEOUT=60 # seconds before giving up on the agent
|
||||
# Cap the agent call with `timeout` when it is available (coreutils; absent on
|
||||
# some minimal / Git-Bash setups), otherwise run copilot unbounded.
|
||||
if command -v timeout >/dev/null 2>&1; then AGENT_RUN="timeout ${AGENT_TIMEOUT}"; else AGENT_RUN=""; fi
|
||||
WEB_RE='\.(html?|xhtml|md|markdown|mdx|js|jsx|ts|tsx|vue|svelte|json|jsonl|css|sql|erb|jinja|j2|twig|ejs|pug|hbs)$'
|
||||
|
||||
command -v curl >/dev/null 2>&1 || { printf 'fix-broken-links: curl not found\n' >&2; exit 0; }
|
||||
|
||||
# ── Hook stdin ────────────────────────────────────────────────────────────────
|
||||
# When called as a postToolUse hook, extract edited files from the JSON payload
|
||||
# and inject them as positional args so collect_input picks them up.
|
||||
_HOOK=""
|
||||
if [ "$#" -eq 0 ] && [ ! -t 0 ]; then
|
||||
_HOOK=1 # invoked as a hook: stdin carries the tool payload
|
||||
_INPUT=$(cat)
|
||||
if command -v jq >/dev/null 2>&1; then
|
||||
_TOOL=$(printf '%s' "$_INPUT" | jq -r '.toolName // .tool_name // empty' 2>/dev/null)
|
||||
case "$_TOOL" in
|
||||
editFiles|edit|write|str_replace_editor|create_file|multiEdit|applyPatch)
|
||||
# Only the files this edit tool just changed - never a wider repo scan.
|
||||
mapfile -t _FILES < <(
|
||||
printf '%s' "$_INPUT" \
|
||||
| jq -r '.tool_input.files[]? // .toolInput.files[]? // .tool_input.path // .toolInput.path // empty' 2>/dev/null
|
||||
)
|
||||
[ "${#_FILES[@]}" -gt 0 ] && set -- "${_FILES[@]}"
|
||||
;;
|
||||
"")
|
||||
# No tool context - called manually with piped input, fall through
|
||||
;;
|
||||
*)
|
||||
# Different tool (bash, read, etc.) - nothing to check
|
||||
exit 0
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
fi
|
||||
|
||||
# A non-empty positional list means the caller passed files: the edited files from
|
||||
# the hook payload above, or paths given on the command line. Only then do we run
|
||||
# the full repair flow (look up alternatives, then prompt to fix). With no
|
||||
# parameters we simply list the broken links - no lookups, no prompts.
|
||||
[ "$#" -gt 0 ] && HAVE_PARAMS=1 || HAVE_PARAMS=0
|
||||
|
||||
# Interactive input comes from the terminal, since stdin may carry hook JSON.
|
||||
# Probe by actually opening /dev/tty - a mere -r/-w test can pass where open fails.
|
||||
TTY=/dev/tty
|
||||
if { true >/dev/tty; } 2>/dev/null && { true </dev/tty; } 2>/dev/null; then
|
||||
TTY=/dev/tty
|
||||
else
|
||||
TTY=""
|
||||
fi
|
||||
ask() {
|
||||
local p="$1" ans=""
|
||||
[ -z "$TTY" ] && { printf '%s' ""; return; }
|
||||
printf '%s' "$p" > "$TTY"
|
||||
IFS= read -r ans < "$TTY" || ans=""
|
||||
printf '%s' "$ans"
|
||||
}
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
http_status() {
|
||||
curl -s -o /dev/null -w '%{http_code}' --max-time "$TIMEOUT" --location -A "$UA" "$1" 2>/dev/null
|
||||
}
|
||||
|
||||
# Escape ERE metacharacters so a literal string can be used safely inside a bash
|
||||
# [[ =~ ]] pattern. Only true metacharacters are escaped - backslash-escaping an
|
||||
# ordinary character (e.g. '\:') is undefined in ERE and would fail to match.
|
||||
re_escape() {
|
||||
local s="$1" out="" c i bs='\' meta='.^$*+?()[]{}|\'
|
||||
for ((i = 0; i < ${#s}; i++)); do
|
||||
c="${s:i:1}"
|
||||
if [[ "$meta" == *"$c"* ]]; then out+="$bs$c"; else out+="$c"; fi
|
||||
done
|
||||
printf '%s' "$out"
|
||||
}
|
||||
|
||||
# Read an entire file into a variable, preserving newlines.
|
||||
read_file() { IFS= read -rd '' "$1" < "$2" || true; }
|
||||
|
||||
# Escape glob metacharacters (\ * ? [) so a string is matched literally inside
|
||||
# ${var//pattern/repl}, which otherwise interprets the pattern as a glob. URLs
|
||||
# and Markdown link spans routinely contain ? and [ ], so this is required for a
|
||||
# correct fixed-string replacement.
|
||||
glob_escape() {
|
||||
local s="$1" out="" c i
|
||||
for ((i = 0; i < ${#s}; i++)); do
|
||||
c="${s:i:1}"
|
||||
case "$c" in
|
||||
'\'|'*'|'?'|'[') out+="\\$c" ;;
|
||||
*) out+="$c" ;;
|
||||
esac
|
||||
done
|
||||
printf '%s' "$out"
|
||||
}
|
||||
|
||||
# Print every http(s) URL in a file, trailing punctuation trimmed, de-duplicated.
|
||||
extract_urls() {
|
||||
grep -oiE 'https?://[^"'\''<> )]+' "$1" 2>/dev/null \
|
||||
| sed -E 's/[.,;:]+$//' \
|
||||
| sort -u
|
||||
}
|
||||
|
||||
# Generic anchor text that weakens SEO.
|
||||
seo_scan() {
|
||||
grep -oiE '<a[^>]*>[[:space:]]*(click here|click|here|read more|more|this page|this|learn more|see more|view|visit|details|info)[[:space:]]*</a>' "$1" 2>/dev/null
|
||||
grep -oiE '\[(click here|click|here|read more|more|this page|learn more|see more|details|info)\]\(' "$1" 2>/dev/null
|
||||
}
|
||||
|
||||
# Try common URL variations; echo the first that returns 200, else nothing.
|
||||
find_variation() {
|
||||
local url="$1" scheme rest host path cand
|
||||
scheme="${url%%://*}"
|
||||
rest="${url#*://}"
|
||||
host="${rest%%/*}"
|
||||
if [ "$rest" = "$host" ]; then path=""; else path="/${rest#*/}"; fi
|
||||
|
||||
local cands=()
|
||||
case "$scheme" in
|
||||
http) cands+=("https://${host}${path}") ;;
|
||||
https) cands+=("http://${host}${path}") ;;
|
||||
esac
|
||||
if [[ "$host" == www.* ]]; then
|
||||
cands+=("${scheme}://${host#www.}${path}")
|
||||
else
|
||||
cands+=("${scheme}://www.${host}${path}")
|
||||
fi
|
||||
if [ -n "$path" ] && [[ "$path" != */ ]] && [[ "${path##*/}" != *.* ]]; then
|
||||
cands+=("${url%/}/")
|
||||
fi
|
||||
|
||||
for cand in "${cands[@]}"; do
|
||||
[ "$cand" = "$url" ] && continue
|
||||
[ "$(http_status "$cand")" = "200" ] && { printf '%s' "$cand"; return 0; }
|
||||
done
|
||||
return 1
|
||||
}
|
||||
|
||||
# Hand the broken link to the Copilot CLI agent and let it propose alternatives.
|
||||
# This is a deliberately lightweight, low-token hand-off: a single non-interactive
|
||||
# prompt to a small model, with no tools enabled - the agent answers from its own
|
||||
# knowledge, so there are no web fetches, no permission prompts, and no archive
|
||||
# lookups on our side. The model may prefix a prose line, so we pull http(s) tokens
|
||||
# from anywhere in the output, trim trailing punctuation, drop the broken URL
|
||||
# itself, and de-duplicate (case-insensitively). Up to MAX lines, one URL each.
|
||||
agent_alts() {
|
||||
local url="$1" max="$2" prompt out
|
||||
command -v copilot >/dev/null 2>&1 || return 0
|
||||
prompt="In under $((AGENT_TIMEOUT - 5)) seconds, find up to ${max} working alternative URLs for the broken link ${url}. Hierarchically consider 1. Path and/or page spelling; 2. web.archive.org/wayback; 3. Redirects using redirect destination; 4. The context of the link's text; in order to resolve. Output only the URLs. One per line, and no: prose, numbering, markdown, backticks, special characters, post formatting."
|
||||
# FIX_BROKEN_LINKS_AGENT marks the child run so a re-entrant hook exits early.
|
||||
out="$(FIX_BROKEN_LINKS_AGENT=1 $AGENT_RUN copilot -p "$prompt" \
|
||||
-s --no-color --model "$AGENT_MODEL" --available-tools 2>/dev/null)"
|
||||
# If copilot errored, timed out, or produced nothing, offer no alternatives.
|
||||
[ $? -eq 0 ] && [ -n "$out" ] || return 0
|
||||
printf '%s\n' "$out" \
|
||||
| grep -oiE 'https?://[^][:space:]"'\''<>)]+' \
|
||||
| sed -E 's/[.,;:]+$//' \
|
||||
| awk -v bad="$url" 'tolower($0) != tolower(bad) && !seen[tolower($0)]++' \
|
||||
| head -n "$max"
|
||||
}
|
||||
|
||||
# Emit up to MAX viable replacement URLs for a broken link, best first:
|
||||
# 1. a working scheme/www/slash variation (verified live 200)
|
||||
# 2. alternatives proposed by the Copilot CLI agent (see agent_alts)
|
||||
# Output is newline-delimited and de-duplicated (case-insensitively). The first
|
||||
# line is what `r` uses; the remainder become the numbered alternatives.
|
||||
suggest_alts() {
|
||||
local url="$1" max="${2:-6}" cand key
|
||||
local -A seen=()
|
||||
local out=()
|
||||
|
||||
cand="$(find_variation "$url")" && [ -n "$cand" ] && { out+=("$cand"); seen["${cand,,}"]=1; }
|
||||
|
||||
while IFS= read -r cand; do
|
||||
[ "${#out[@]}" -ge "$max" ] && break
|
||||
[ -z "$cand" ] && continue
|
||||
key="${cand,,}"; [ -n "${seen[$key]:-}" ] && continue
|
||||
out+=("$cand"); seen[$key]=1
|
||||
done < <(agent_alts "$url" "$max")
|
||||
|
||||
[ "${#out[@]}" -eq 0 ] && return 0
|
||||
printf '%s\n' "${out[@]}"
|
||||
}
|
||||
|
||||
# Replace a literal URL everywhere in a file (pure bash, no regex).
|
||||
replace_url() {
|
||||
local file="$1" old="$2" new="$3" content pat
|
||||
read_file content "$file"
|
||||
pat="$(glob_escape "$old")"
|
||||
printf '%s' "${content//$pat/$new}" > "$file"
|
||||
}
|
||||
|
||||
# Remove the link wrapper but keep the visible text:
|
||||
# <a href="URL">text</a> -> text
|
||||
# [text](URL) -> text
|
||||
# Each matched wrapper is swapped for its inner text via literal replacement.
|
||||
remove_link() {
|
||||
local file="$1" url="$2" content esc re pat
|
||||
read_file content "$file"
|
||||
esc="$(re_escape "$url")"
|
||||
for re in \
|
||||
'<a[^>]*href="'"$esc"'"[^>]*>([^<]*)</a>' \
|
||||
"<a[^>]*href='${esc}'[^>]*>([^<]*)</a>" \
|
||||
'\[([^]]*)\]\('"$esc"'[^)]*\)'; do
|
||||
while [[ $content =~ $re ]]; do
|
||||
# The matched span often contains [ and ] (Markdown), which are glob
|
||||
# metacharacters, so escape it before the literal substitution.
|
||||
pat="$(glob_escape "${BASH_REMATCH[0]}")"
|
||||
content="${content//$pat/${BASH_REMATCH[1]}}"
|
||||
done
|
||||
done
|
||||
printf '%s' "$content" > "$file"
|
||||
}
|
||||
|
||||
# ── File discovery ────────────────────────────────────────────────────────────
|
||||
|
||||
collect_input() {
|
||||
if [ "$#" -gt 0 ]; then printf '%s\n' "$@"; return; fi
|
||||
# Fired as a hook but the payload carried no (web) files: do nothing rather than
|
||||
# fall back to scanning unrelated files - the hook only ever checks edited files.
|
||||
[ -n "$_HOOK" ] && return
|
||||
local out=""
|
||||
if command -v git >/dev/null 2>&1 && git rev-parse --git-dir >/dev/null 2>&1; then
|
||||
out="$({ git diff --name-only HEAD; git diff --name-only --cached; } 2>/dev/null)"
|
||||
fi
|
||||
if [ -n "$out" ]; then printf '%s\n' "$out"; return; fi
|
||||
find . -type d \( -name .git -o -name node_modules -o -name dist -o -name build \
|
||||
-o -name .next -o -name .venv -o -name __pycache__ \) -prune \
|
||||
-o -type f -print 2>/dev/null
|
||||
}
|
||||
|
||||
declare -A SEEN
|
||||
FILES=()
|
||||
while IFS= read -r f; do
|
||||
[ -z "$f" ] && continue
|
||||
[ -f "$f" ] || continue
|
||||
case "$f" in */node_modules/*|*/.git/*|*/dist/*|*/build/*) continue ;; esac
|
||||
printf '%s\n' "$f" | grep -qiE "$WEB_RE" || continue
|
||||
[ -n "${SEEN[$f]:-}" ] && continue
|
||||
SEEN[$f]=1
|
||||
FILES+=("$f")
|
||||
done < <(collect_input "$@")
|
||||
|
||||
[ "${#FILES[@]}" -eq 0 ] && exit 0
|
||||
|
||||
# ── Scan ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
B_FILE=(); B_URL=(); B_STATUS=(); B_ALT=()
|
||||
SEO_LINES=()
|
||||
|
||||
for file in "${FILES[@]}"; do
|
||||
while IFS= read -r line; do
|
||||
[ -n "$line" ] && SEO_LINES+=("$file: $line")
|
||||
done < <(seo_scan "$file")
|
||||
|
||||
mapfile -t urls < <(extract_urls "$file")
|
||||
[ "${#urls[@]}" -eq 0 ] && continue
|
||||
|
||||
if [ "$HAVE_PARAMS" = "1" ] && [ "${#urls[@]}" -gt "$LIMIT" ]; then
|
||||
ans="$(ask " ${file} has ${#urls[@]} links (limit ${LIMIT}). Continue? [Y/n] ")"
|
||||
case "$ans" in n|N|no|NO) continue ;; esac
|
||||
fi
|
||||
|
||||
printf '\n Checking %d link(s) in %s ...\n' "${#urls[@]}" "$file"
|
||||
for url in "${urls[@]}"; do
|
||||
status="$(http_status "$url")"
|
||||
[ "$status" = "200" ] && continue
|
||||
printf ' BROKEN (%s) %s\n' "$status" "$url"
|
||||
# Only look up replacements when files were passed; otherwise just list.
|
||||
alts=""
|
||||
[ "$HAVE_PARAMS" = "1" ] && alts="$(suggest_alts "$url" 6)"
|
||||
B_FILE+=("$file"); B_URL+=("$url"); B_STATUS+=("$status"); B_ALT+=("$alts")
|
||||
done
|
||||
done
|
||||
|
||||
# ── SEO report ────────────────────────────────────────────────────────────────
|
||||
|
||||
if [ "${#SEO_LINES[@]}" -gt 0 ]; then
|
||||
printf '\n%s\n SEO anchor issues (consider descriptive link text)\n' "------------------------------------------------------------"
|
||||
for s in "${SEO_LINES[@]}"; do printf ' %s\n' "$s"; done
|
||||
fi
|
||||
|
||||
if [ "${#B_URL[@]}" -eq 0 ]; then
|
||||
printf '\n No broken links found.\n\n'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# ── Interactive fix ───────────────────────────────────────────────────────────
|
||||
|
||||
printf '\n%s\n fix-broken-links report\n%s\n' "============================================================" "============================================================"
|
||||
|
||||
declare -A CHANGED
|
||||
n="${#B_URL[@]}"
|
||||
for ((i=0; i<n; i++)); do
|
||||
file="${B_FILE[$i]}"; url="${B_URL[$i]}"; status="${B_STATUS[$i]}"
|
||||
printf '\n [%d] %s\n' "$((i+1))" "$file"
|
||||
printf ' URL : %s\n' "$url"
|
||||
note=""; case "$status" in ERR|000|TIMEOUT) note=" (unreachable)" ;; esac
|
||||
printf ' HTTP: %s%s\n' "$status" "$note"
|
||||
|
||||
# No file parameters → report-only: list the broken link and move on.
|
||||
[ "$HAVE_PARAMS" = "1" ] || continue
|
||||
|
||||
alts=(); [ -n "${B_ALT[$i]}" ] && mapfile -t alts <<< "${B_ALT[$i]}"
|
||||
printf '\n'
|
||||
if [ "${#alts[@]}" -gt 0 ]; then
|
||||
printf ' r Replace -> %s\n' "${alts[0]}"
|
||||
for ((k=1; k<${#alts[@]}; k++)); do
|
||||
printf ' %d Replace -> %s\n' "$k" "${alts[$k]}"
|
||||
done
|
||||
fi
|
||||
printf ' d Remove link, keep text\n'
|
||||
printf ' c Custom replacement URL\n'
|
||||
printf ' s Skip\n'
|
||||
|
||||
if [ -z "$TTY" ]; then
|
||||
printf ' (no terminal - reporting only)\n'
|
||||
continue
|
||||
fi
|
||||
|
||||
while true; do
|
||||
ch="$(ask ' > ')"
|
||||
case "$ch" in
|
||||
s|"") break ;;
|
||||
d) remove_link "$file" "$url"; CHANGED[$file]=1; printf ' removed\n'; break ;;
|
||||
r) if [ "${#alts[@]}" -gt 0 ]; then
|
||||
replace_url "$file" "$url" "${alts[0]}"; CHANGED[$file]=1; printf ' replaced -> %s\n' "${alts[0]}"; break
|
||||
fi
|
||||
printf ' no suggestion available\n' ;;
|
||||
[1-9]) if [ "$ch" -lt "${#alts[@]}" ]; then
|
||||
replace_url "$file" "$url" "${alts[$ch]}"; CHANGED[$file]=1; printf ' replaced -> %s\n' "${alts[$ch]}"; break
|
||||
else printf ' invalid choice\n'; fi ;;
|
||||
c) u="$(ask ' URL: ')"
|
||||
if [ -n "$u" ]; then replace_url "$file" "$url" "$u"; CHANGED[$file]=1; printf ' replaced\n'; break; fi ;;
|
||||
*) printf ' invalid choice\n' ;;
|
||||
esac
|
||||
done
|
||||
done
|
||||
|
||||
if [ "${CHANGED[*]+x}" = x ] && [ "${#CHANGED[@]}" -gt 0 ]; then
|
||||
printf '\n %d file(s) updated:\n' "${#CHANGED[@]}"
|
||||
for f in "${!CHANGED[@]}"; do printf ' %s\n' "$f"; done
|
||||
printf '\n'
|
||||
fi
|
||||
exit 0
|
||||
Reference in New Issue
Block a user