Add geofeed-tuner skill for RFC 8805 IP geolocation feeds (#1138)

* Add geofeed-tuner skill for RFC 8805 IP geolocation feeds

* Fix Codespell errors and apply  awesome-copilot contrib guidelines

* Fix Codespell errors and apply  awesome-copilot contrib guidelines

* Fix Codespell errors and apply  awesome-copilot contrib guidelines

* Update geofeed-tuner skill description and assets
This commit is contained in:
Punit
2026-03-24 10:55:10 +05:30
committed by GitHub
parent 2f2fb39a82
commit 9856b62b88
10 changed files with 25539 additions and 1 deletions

View File

@@ -48,4 +48,4 @@ ignore-words-list = numer,wit,aks,edn,ser,ois,gir,rouge,categor,aline,ative,afte
# Skip certain files and directories
skip = .git,node_modules,package-lock.json,*.lock,website/build,website/.docusaurus,.all-contributorrc
skip = .git,node_modules,package-lock.json,*.lock,website/build,website/.docusaurus,.all-contributorrc,./skills/geofeed-tuner/assets/*.json,./skills/geofeed-tuner/references/*.txt

View File

@@ -131,6 +131,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
| [game-engine](../skills/game-engine/SKILL.md) | Expert skill for building web-based game engines and games using HTML5, Canvas, WebGL, and JavaScript. Use when asked to create games, build game engines, implement game physics, handle collision detection, set up game loops, manage sprites, add game controls, or work with 2D/3D rendering. Covers techniques for platformers, breakout-style games, maze games, tilemaps, audio, multiplayer via WebRTC, and publishing games. | `assets/2d-maze-game.md`<br />`assets/2d-platform-game.md`<br />`assets/gameBase-template-repo.md`<br />`assets/paddle-game-template.md`<br />`assets/simple-2d-engine.md`<br />`references/3d-web-games.md`<br />`references/algorithms.md`<br />`references/basics.md`<br />`references/game-control-mechanisms.md`<br />`references/game-engine-core-principles.md`<br />`references/game-publishing.md`<br />`references/techniques.md`<br />`references/terminology.md`<br />`references/web-apis.md` |
| [gen-specs-as-issues](../skills/gen-specs-as-issues/SKILL.md) | This workflow guides you through a systematic approach to identify missing features, prioritize them, and create detailed specifications for implementation. | None |
| [generate-custom-instructions-from-codebase](../skills/generate-custom-instructions-from-codebase/SKILL.md) | Migration and code evolution instructions generator for GitHub Copilot. Analyzes differences between two project versions (branches, commits, or releases) to create precise instructions allowing Copilot to maintain consistency during technology migrations, major refactoring, or framework version upgrades. | None |
| [geofeed-tuner](../skills/geofeed-tuner/SKILL.md) | Use this skill whenever the user mentions IP geolocation feeds, RFC 8805, geofeeds, or wants help creating, tuning, validating, or publishing a self-published IP geolocation feed in CSV format. Intended user audience is a network operator, ISP, mobile carrier, cloud provider, hosting company, IXP, or satellite provider asking about IP geolocation accuracy, or geofeed authoring best practices. Helps create, refine, and improve CSV-format IP geolocation feeds with opinionated recommendations beyond RFC 8805 compliance. Do NOT use for private or internal IP address management — applies only to publicly routable IP addresses. | `assets/example`<br />`assets/iso3166-1.json`<br />`assets/iso3166-2.json`<br />`assets/small-territories.json`<br />`references/rfc8805.txt`<br />`references/snippets-python3.md`<br />`scripts/templates` |
| [gh-cli](../skills/gh-cli/SKILL.md) | GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line. | None |
| [git-commit](../skills/git-commit/SKILL.md) | Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping | None |
| [git-flow-branch-creator](../skills/git-flow-branch-creator/SKILL.md) | Intelligent Git Flow branch creator that analyzes git status/diff and creates appropriate branches following the nvie Git Flow branching model. | None |

View File

@@ -0,0 +1,864 @@
---
name: geofeed-tuner
description: >
Use this skill whenever the user mentions IP geolocation feeds, RFC 8805, geofeeds, or wants help creating, tuning, validating, or publishing a
self-published IP geolocation feed in CSV format. Intended user audience is a network
operator, ISP, mobile carrier, cloud provider, hosting company, IXP, or satellite provider
asking about IP geolocation accuracy, or geofeed authoring best practices.
Helps create, refine, and improve CSV-format IP geolocation feeds with opinionated
recommendations beyond RFC 8805 compliance. Do NOT use for private or internal IP address
management — applies only to publicly routable IP addresses.
license: Apache-2.0
metadata:
author: Sid Mathur <support@getfastah.com>
version: "0.0.9"
compatibility: Requires Python 3
---
# Geofeed Tuner Create Better IP Geolocation Feeds
This skill helps you create and improve IP geolocation feeds in CSV format by:
- Ensuring your CSV is well-formed and consistent
- Checking alignment with [RFC 8805](references/rfc8805.txt) (the industry standard)
- Applying **opinionated best practices** learned from real-world deployments
- Suggesting improvements for accuracy, completeness, and privacy
## When to Use This Skill
- Use this skill when a user asks for help **creating, improving, or publishing** an IP geolocation feed file in CSV format.
- Use it to **tune and troubleshoot CSV geolocation feeds** — catching errors, suggesting improvements, and ensuring real-world usability beyond RFC compliance.
- **Intended audience:**
- Network operators, administrators, and engineers responsible for publicly routable IP address space
- Organizations such as ISPs, mobile carriers, cloud providers, hosting and colocation companies, Internet Exchange operators, and satellite internet providers
- **Do not use** this skill for private or internal IP address management; it applies **only to publicly routable IP addresses**.
## Prerequisites
- **Python 3** is required.
## Directory Structure and File Management
This skill uses a clear separation between **distribution files** (read-only) and **working files** (generated at runtime).
### Read-Only Directories (Do Not Modify)
The following directories contain static distribution assets. **Do not create, modify, or delete files in these directories:**
| Directory | Purpose |
|----------------|------------------------------------------------------------|
| `assets/` | Static data files (ISO codes, examples) |
| `references/` | RFC specifications and code snippets for reference |
| `scripts/` | Executable code and HTML template files for reports |
### Working Directories (Generated Content)
All generated, temporary, and output files go in these directories:
| Directory | Purpose |
|-----------------|------------------------------------------------------|
| `run/` | Working directory for all agent-generated content |
| `run/data/` | Downloaded CSV files from remote URLs |
| `run/report/` | Generated HTML tuning reports |
### File Management Rules
1. **Never write to `assets/`, `references/`, or `scripts/`** — these are part of the skill distribution and must remain unchanged.
2. **All downloaded input files** (from remote URLs) must be saved to `./run/data/`.
3. **All generated HTML reports** must be saved to `./run/report/`.
4. **All generated Python scripts** must be saved to `./run/`.
5. The `run/` directory may be cleared between sessions; do not store permanent data there.
6. **Working directory for execution:** All generated scripts in `./run/` must be executed with the **skill root directory** (the directory containing `SKILL.md`) as the current working directory, so that relative paths like `assets/iso3166-1.json` and `./run/data/report-data.json` resolve correctly. Do not `cd` into `./run/` before running scripts.
## Processing Pipeline: Sequential Phase Execution
All phases must be executed **in order**, from Phase 1 through Phase 6. Each phase depends on the successful completion of the previous phase. For example, **structure checks** must complete before **quality analysis** can run.
The phases are summarized below. The agent must follow the detailed steps outlined further in each phase section.
| Phase | Name | Description |
|-------|----------------------------|-----------------------------------------------------------------------------------|
| 1 | Understand the Standard | Review the key requirements of RFC 8805 for self-published IP geolocation feeds |
| 2 | Gather Input | Collect IP subnet data from local files or remote URLs |
| 3 | Checks & Suggestions | Validate CSV structure, analyze IP prefixes, and check data quality |
| 4 | Tuning Data Lookup | Use Fastah's MCP tool to retrieve tuning data for improving geolocation accuracy |
| 5 | Generate Tuning Report | Create an HTML report summarizing the analysis and suggestions |
| 6 | Final Review | Verify consistency and completeness of the report data |
**Do not skip phases.** Each phase provides critical checks or data transformations required by subsequent stages.
### Execution Plan Rules
Before executing each phase, the agent MUST generate a visible TODO checklist.
The plan MUST:
- Appear at the very start of the phase
- List every step in order
- Use a checkbox format
- Be updated live as steps complete
### Phase 1: Understand the Standard
The key requirements from RFC 8805 that this skill enforces are summarized below. **Use this summary as your working reference.** Only consult the full [RFC 8805 text](references/rfc8805.txt) for edge cases, ambiguous situations, or when the user asks a standards question not covered here.
#### RFC 8805 Key Facts
**Purpose:** A self-published IP geolocation feed lets network operators publish authoritative location data for their IP address space in a simple CSV format, allowing geolocation providers to incorporate operator-supplied corrections.
**CSV Column Order (Sections 2.1.1.12.1.1.5):**
| Column | Field | Required | Notes |
|--------|---------------|----------|------------------------------------------------------------|
| 1 | `ip_prefix` | Yes | CIDR notation; IPv4 or IPv6; must be a network address |
| 2 | `alpha2code` | No | ISO 3166-1 alpha-2 country code; empty or "ZZ" = do-not-geolocate |
| 3 | `region` | No | ISO 3166-2 subdivision code (e.g., `US-CA`) |
| 4 | `city` | No | Free-text city name; no authoritative validation set |
| 5 | `postal_code` | No | **Deprecated** — must be left empty or absent |
**Structural rules:**
- Files may contain comment lines beginning with `#` (including the header, if present).
- A header row is optional; if present, it is treated as a comment if it starts with `#`.
- Files must be encoded in UTF-8.
- Subnet host bits must not be set (i.e., `192.168.1.1/24` is invalid; use `192.168.1.0/24`).
- Applies only to **globally routable** unicast addresses — not private, loopback, link-local, or multicast space.
**Do-not-geolocate:** An entry with an empty `alpha2code` or case-insensitive `ZZ` (irrespective of values of region/city) is an explicit signal that the operator does not want geolocation applied to that prefix.
**Postal codes deprecated (Section 2.1.1.5):** The fifth column must not contain postal or ZIP codes. They are too fine-grained for IP-range mapping and raise privacy concerns.
### Phase 2: Gather Input
- If the user has not already provided a list of IP subnets or ranges (sometimes referred to as `inetnum` or `inet6num`), prompt them to supply it. Accepted input formats:
- Text pasted into the chat
- A local CSV file
- A remote URL pointing to a CSV file
- If the input is a **remote URL**:
- Attempt to download the CSV file to `./run/data/` before processing.
- On HTTP error (4xx, 5xx, timeout, or redirect loop), **stop immediately** and report to the user:
`Feed URL is not reachable: HTTP {status_code}. Please verify the URL is publicly accessible.`
- Do not proceed to Phase 3 with an incomplete or empty download.
- If the input is a **local file**, process it directly without downloading.
- **Encoding detection and normalization:**
1. Attempt to read the file as UTF-8 first.
2. If a `UnicodeDecodeError` is raised, try `utf-8-sig` (UTF-8 with BOM), then `latin-1`.
3. Once successfully decoded, re-encode and write the working copy as UTF-8.
4. If no encoding succeeds, stop and report: `Unable to decode input file. Please save it as UTF-8 and try again.`
### Phase 3: Checks & Suggestions
#### Execution Rules
- Generate a **script** for this phase.
- Do NOT combine this phase with others.
- Do NOT precompute future-phase data.
- Store the output as a JSON file at: [`./run/data/report-data.json`](./run/data/report-data.json)
#### Schema Definition
The JSON structure below is **IMMUTABLE** during Phase 3. Phase 4 will later add a `TunedEntry` object to each object in `Entries` — this is the only permitted schema extension and happens in a separate phase.
JSON keys map directly to template placeholders like `{{.CountryCode}}`, `{{.HasError}}`, etc.
```json
{
"InputFile": "",
"Timestamp": 0,
"TotalEntries": 0,
"IpV4Entries": 0,
"IpV6Entries": 0,
"InvalidEntries": 0,
"Errors": 0,
"Warnings": 0,
"OK": 0,
"Suggestions": 0,
"CityLevelAccuracy": 0,
"RegionLevelAccuracy": 0,
"CountryLevelAccuracy": 0,
"DoNotGeolocate": 0,
"Entries": [
{
"Line": 0,
"IPPrefix": "",
"CountryCode": "",
"RegionCode": "",
"City": "",
"Status": "",
"IPVersion": "",
"Messages": [
{
"ID": "",
"Type": "",
"Text": "",
"Checked": false
}
],
"HasError": false,
"HasWarning": false,
"HasSuggestion": false,
"DoNotGeolocate": false,
"GeocodingHint": "",
"Tunable": false
}
]
}
```
Field definitions:
**Top-level metadata:**
- `InputFile`: The original input source, either a local filename or a remote URL.
- `Timestamp`: Milliseconds since Unix epoch when the tuning was performed.
- `TotalEntries`: Total number of data rows processed (excluding comment and blank lines).
- `IpV4Entries`: Count of entries that are IPv4 subnets.
- `IpV6Entries`: Count of entries that are IPv6 subnets.
- `InvalidEntries`: Count of entries that failed IP prefix parsing and CSV parsing.
- `Errors`: Total entries whose `Status` is `ERROR`.
- `Warnings`: Total entries whose `Status` is `WARNING`.
- `OK`: Total entries whose `Status` is `OK`.
- `Suggestions`: Total entries whose `Status` is `SUGGESTION`.
- `CityLevelAccuracy`: Count of valid entries where `City` is non-empty.
- `RegionLevelAccuracy`: Count of valid entries where `RegionCode` is non-empty and `City` is empty.
- `CountryLevelAccuracy`: Count of valid entries where `CountryCode` is non-empty, `RegionCode` is empty, and `City` is empty.
- `DoNotGeolocate` (metadata): Count of valid entries where `CountryCode`, `RegionCode`, and `City` are all empty.
**Entry fields:**
- `Entries`: Array of objects, one per data row, with the following per-entry fields:
- `Line`: 1-based line number in the original CSV (counting all lines including comments and blanks).
- `IPPrefix`: The normalized IP prefix in CIDR slash notation.
- `CountryCode`: The ISO 3166-1 alpha-2 country code, or empty string.
- `RegionCode`: The ISO 3166-2 region code (e.g., `US-CA`), or empty string.
- `City`: The city name, or empty string.
- `Status`: Highest severity assigned: `ERROR` > `WARNING` > `SUGGESTION` > `OK`.
- `IPVersion`: `"IPv4"` or `"IPv6"` based on the parsed IP prefix.
- `Messages`: Array of message objects, each with:
- `ID`: String identifier from the **Validation Rules Reference** table below (e.g., `"1101"`, `"3301"`).
- `Type`: The severity type: `"ERROR"`, `"WARNING"`, or `"SUGGESTION"`.
- `Text`: The human-readable validation message string.
- `Checked`: `true` if the validation rule is auto-tunable (`Tunable: true` in the reference table), `false` otherwise. Controls whether the checkbox in the report is `checked` or `disabled`.
- `HasError`: `true` if any message has `Type` `"ERROR"`.
- `HasWarning`: `true` if any message has `Type` `"WARNING"`.
- `HasSuggestion`: `true` if any message has `Type` `"SUGGESTION"`.
- `DoNotGeolocate` (entry): `true` if `CountryCode` is empty or `"ZZ"` — the entry is an explicit do-not-geolocate signal.
- `GeocodingHint`: Always empty string `""` in Phase 3. Reserved for future use.
- `Tunable`: `true` if **any** message in the entry has `Checked: true`. Computed as logical OR across all messages' `Checked` values. This flag drives the "Tune" button visibility in the report.
#### Validation Rules Reference
When adding messages to an entry, use the `ID`, `Type`, `Text`, and `Checked` values from this table.
| ID | Type | Text | Checked | Condition Reference |
|--------|--------------|------------------------------------------------------------------------------------------------|---------|----------------------------------------|
| `1101` | `ERROR` | IP prefix is empty | `false` | IP Prefix Analysis: empty |
| `1102` | `ERROR` | Invalid IP prefix: unable to parse as IPv4 or IPv6 network | `false` | IP Prefix Analysis: invalid syntax |
| `1103` | `ERROR` | Non-public IP range is not allowed in an RFC 8805 feed | `false` | IP Prefix Analysis: non-public |
| `3101` | `SUGGESTION` | IPv4 prefix is unusually large and may indicate a typo | `false` | IP Prefix Analysis: IPv4 < /22 |
| `3102` | `SUGGESTION` | IPv6 prefix is unusually large and may indicate a typo | `false` | IP Prefix Analysis: IPv6 < /64 |
| `1201` | `ERROR` | Invalid country code: not a valid ISO 3166-1 alpha-2 value | `true` | Country Code Analysis: invalid |
| `1301` | `ERROR` | Invalid region format; expected COUNTRY-SUBDIVISION (e.g., US-CA) | `true` | Region Code Analysis: bad format |
| `1302` | `ERROR` | Invalid region code: not a valid ISO 3166-2 subdivision | `true` | Region Code Analysis: unknown code |
| `1303` | `ERROR` | Region code does not match the specified country code | `true` | Region Code Analysis: mismatch |
| `1401` | `ERROR` | Invalid city name: placeholder value is not allowed | `false` | City Name Analysis: placeholder |
| `1402` | `ERROR` | Invalid city name: abbreviated or code-based value detected | `true` | City Name Analysis: abbreviation |
| `2401` | `WARNING` | City name formatting is inconsistent; consider normalizing the value | `true` | City Name Analysis: formatting |
| `1501` | `ERROR` | Postal codes are deprecated by RFC 8805 and must be removed for privacy reasons | `true` | Postal Code Check |
| `3301` | `SUGGESTION` | Region is usually unnecessary for small territories; consider removing the region value | `true` | Tuning: small territory region |
| `3402` | `SUGGESTION` | City-level granularity is usually unnecessary for small territories; consider removing the city value | `true` | Tuning: small territory city |
| `3303` | `SUGGESTION` | Region code is recommended when a city is specified; choose a region from the dropdown | `true` | Tuning: missing region with city |
| `3104` | `SUGGESTION` | Confirm whether this subnet is intentionally marked as do-not-geolocate or missing location data | `true` | Tuning: unspecified geolocation |
#### Populating Messages
When a validation check matches, add a message to the entry's `Messages` array using the values from the reference table:
```python
entry["Messages"].append({
"ID": "1201", # From the table
"Type": "ERROR", # From the table
"Text": "Invalid country code: not a valid ISO 3166-1 alpha-2 value", # From the table
"Checked": True # From the table (True = tunable)
})
```
After populating all messages for an entry, derive the entry-level flags:
```python
entry["HasError"] = any(m["Type"] == "ERROR" for m in entry["Messages"])
entry["HasWarning"] = any(m["Type"] == "WARNING" for m in entry["Messages"])
entry["HasSuggestion"] = any(m["Type"] == "SUGGESTION" for m in entry["Messages"])
entry["Tunable"] = any(m["Checked"] for m in entry["Messages"])
```
#### Accuracy Level Counting Rules
Accuracy levels are **mutually exclusive**. Assign each valid (non-ERROR, non-invalid) entry to exactly one bucket based on the most granular non-empty geo field:
| Condition | Bucket |
|--------------------------------------------------------------|-----------------------------|
| `City` is non-empty | `CityLevelAccuracy` |
| `RegionCode` non-empty AND `City` is empty | `RegionLevelAccuracy` |
| `CountryCode` non-empty, `RegionCode` and `City` empty | `CountryLevelAccuracy` |
| `DoNotGeolocate` (entry) is `true` | `DoNotGeolocate` (metadata) |
**Do not count** entries with `HasError: true` or entries in `InvalidEntries` in any accuracy bucket.
The agent MUST NOT:
- Rename fields
- Add or remove fields
- Change data types
- Reorder keys
- Alter nesting
- Wrap the object
- Split into multiple files
If a value is unknown, **leave it empty** — never invent data.
#### Structure & Format Check
This phase verifies that your feed is well-formed and parseable. **Critical structural errors** must be resolved before the tuner can analyze geolocation quality.
##### CSV Structure
This subsection defines rules for **CSV-formatted input files** used for IP geolocation feeds.
The goal is to ensure the file can be parsed reliably and normalized into a **consistent internal representation**.
- **CSV Structure Checks**
- If `pandas` is available, use it for CSV parsing.
- Otherwise, fall back to Python's built-in `csv` module.
- Ensure the CSV contains **exactly 4 or 5 logical columns**.
- Comment lines are allowed.
- A header row **may or may not** be present.
- If no header row exists, assume the implicit column order:
```
ip_prefix, alpha2code, region, city, postal code (deprecated)
```
- Refer to the example input file:
[`assets/example/01-user-input-rfc8805-feed.csv`](assets/example/01-user-input-rfc8805-feed.csv)
- **CSV Cleansing and Normalization**
- Clean and normalize the CSV using Python logic equivalent to the following operations:
- Select only the **first five columns**, dropping any columns beyond the fifth.
- Write the output file with a **UTF-8 BOM**.
- **Comments**
- Remove comment rows where the **first column begins with `#`**.
- This also removes a header row if it begins with `#`.
- Create a map of comments using the **1-based line number** as the key and the full original line as the value. Also store blank lines.
- Store this map in a JSON file at: [`./run/data/comments.json`](./run/data/comments.json)
- Example: `{ "4": "# It's OK for small city states to leave state ISO2 code unspecified" }`
- **Notes**
- Both implementation paths (`pandas` and built-in `csv`) must write output using
the `utf-8-sig` encoding to ensure a **UTF-8 BOM** is present.
#### IP Prefix Analysis
- Check that the `IPPrefix` field is present and non-empty for each entry.
- Check for duplicate `IPPrefix` values across entries.
- If duplicates are found, stop the skill and report to the user with the message: `Duplicate IP prefix detected: {ip_prefix_value} appears on lines {line_numbers}`
- If no duplicates are found, continue with the analysis.
- **Checks**
- Each subnet must parse cleanly as either an **IPv4 or IPv6 network** using the code snippets in the `references/` folder.
- Subnets must be normalized and displayed in **CIDR slash notation**.
- Single-host IPv4 subnets must be represented as **`/32`**.
- Single-host IPv6 subnets must be represented as **`/128`**.
- **ERROR**
- Report the following conditions as **ERROR**:
- **Invalid subnet syntax**
- Message ID: `1102`
- **Non-public address space**
- Applies to subnets that are **private, loopback, link-local, multicast, or otherwise non-public**
- In Python, detect non-public ranges using `is_private` and related address properties as shown in `./references`.
- Message ID: `1103`
- **SUGGESTION**
- Report the following conditions as **SUGGESTION**:
- **Overly large IPv6 subnets**
- Prefixes shorter than `/64`
- Message ID: `3102`
- **Overly large IPv4 subnets**
- Prefixes shorter than `/22`
- Message ID: `3101`
#### Geolocation Quality Check
Analyze the **accuracy and consistency** of geolocation data:
- Country codes
- Region codes
- City names
- Deprecated fields
This phase runs after structural checks pass.
##### Country Code Analysis
- Use the locally available data table [`ISO3166-1`](assets/iso3166-1.json) for checking.
- JSON array of countries and territories with ISO codes
- Each object includes:
- `alpha_2`: two-letter country code
- `name`: short country name
- `flag`: flag emoji
- This file represents the **superset of valid `CountryCode` values** for an RFC 8805 CSV.
- Check the entry's `CountryCode` (RFC 8805 Section 2.1.1.2, column `alpha2code`) against the `alpha_2` attribute.
- Sample code is available in the `references/` directory.
- If a country is found in [`assets/small-territories.json`](assets/small-territories.json), mark the entry internally as a small territory. This flag is used in later checks and suggestions but is **not stored in the output JSON** (it is transient validation state).
- **Note:** `small-territories.json` contains some historic/disputed codes (`AN`, `CS`, `XK`) that are not present in `iso3166-1.json`. An entry using one of these as its `CountryCode` will fail the country code validation (ERROR) even though it matches as a small territory. The country code ERROR takes precedence — do not suppress it based on the small-territory flag.
- **ERROR**
- Report the following conditions as **ERROR**:
- **Invalid country code**
- Condition: `CountryCode` is present but not found in the `alpha_2` set
- Message ID: `1201`
- **SUGGESTION**
- Report the following conditions as **SUGGESTION**:
- **Unspecified geolocation for subnet**
- Condition: All geographical fields (`CountryCode`, `RegionCode`, `City`) are empty for a subnet.
- Action:
- Set `DoNotGeolocate = true` for the entry.
- Set `CountryCode` to `ZZ` for the entry.
- Message ID: `3104`
##### Region Code Analysis
- Use the locally available data table [`ISO3166-2`](assets/iso3166-2.json) for checking.
- JSON array of country subdivisions with ISO-assigned codes
- Each object includes:
- `code`: subdivision code prefixed with country code (e.g., `US-CA`)
- `name`: short subdivision name
- This file represents the **superset of valid `RegionCode` values** for an RFC 8805 CSV.
- If a `RegionCode` value is provided (RFC 8805 Section 2.1.1.3):
- Check that the format matches `{COUNTRY}-{SUBDIVISION}` (e.g., `US-CA`, `AU-NSW`).
- Check the value against the `code` attribute (already prefixed with the country code).
- **Small-territory exception:** If the entry is a small territory **and** the `RegionCode` value equals the entry's `CountryCode` (e.g., `SG` as both country and region for Singapore), treat the region as acceptable — skip all region validation checks for this entry. Small territories are effectively city-states with no meaningful ISO 3166-2 administrative subdivisions.
- **ERROR**
- Report the following conditions as **ERROR**:
- **Invalid region format**
- Condition: `RegionCode` does not match `{COUNTRY}-{SUBDIVISION}` **and** the small-territory exception does not apply
- Message ID: `1301`
- **Unknown region code**
- Condition: `RegionCode` value is not found in the `code` set **and** the small-territory exception does not apply
- Message ID: `1302`
- **Countryregion mismatch**
- Condition: Country portion of `RegionCode` does not match `CountryCode`
- Message ID: `1303`
##### City Name Analysis
- City names are validated using **heuristic checks only**.
- There is currently **no authoritative dataset** available for validating city names.
- **ERROR**
- Report the following conditions as **ERROR**:
- **Placeholder or non-meaningful values**
- Condition: Placeholder or non-meaningful values including but not limited to:
- `undefined`
- `Please select`
- `null`
- `N/A`
- `TBD`
- `unknown`
- Message ID: `1401`
- **Truncated names, abbreviations, or airport codes**
- Condition: Truncated names, abbreviations, or airport codes that do not represent valid city names:
- `LA`
- `Frft`
- `sin01`
- `LHR`
- `SIN`
- `MAA`
- Message ID: `1402`
- **WARNING**
- Report the following conditions as **WARNING**:
- **Inconsistent casing or formatting**
- Condition: City names with inconsistent casing, spacing, or formatting that may reduce data quality, for example:
- `HongKong` vs `Hong Kong`
- Mixed casing or unexpected script usage
- Message ID: `2401`
##### Postal Code Check
- RFC 8805 Section 2.1.1.5 explicitly **deprecates postal or ZIP codes**.
- Postal codes can represent very small populations and are **not considered privacy-safe** for mapping IP address ranges, which are statistical in nature.
- **ERROR**
- Report the following conditions as **ERROR**:
- **Postal code present**
- Condition: A non-empty value is present in the postal/ZIP code field.
- Message ID: `1501`
#### Tuning & Recommendations
This phase applies **opinionated recommendations** beyond RFC 8805, learned from real-world geofeed deployments, that improve accuracy and usability.
- **SUGGESTION**
- Report the following conditions as **SUGGESTION**:
- **Region or city specified for small territory**
- Condition:
- Entry is a small territory
- `RegionCode` is non-empty **OR**
- `City` is non-empty.
- Message IDs: `3301` (for region), `3402` (for city)
- **Missing region code when city is specified**
- Condition:
- `City` is non-empty
- `RegionCode` is empty
- Entry is **not** a small territory
- Message ID: `3303`
### Phase 4: Tuning Data Lookup
#### Objective
Lookup all the `Entries` using Fastah's `rfc8805-row-place-search` tool.
#### Execution Rules
- Generate a new **script** _only_ for payload generation (read the dataset and write one or more payload JSON files; do not call MCP from this script).
- Server only accepts 1000 entries per request, so if there are more than 1000 entries, split into multiple requests.
- The agent must read the generated payload files, construct the requests from them, and send those requests to the MCP server in batches of at most 1000 entries each.
- **On MCP failure:** If the MCP server is unreachable, returns an error, or returns no results for any batch, log a warning and continue to Phase 5. Set `TunedEntry: {}` for all affected entries. Do not block report generation. Notify the user clearly: `Tuning data lookup unavailable; the report will show validation results only.`
- Suggestions are **advisory only** — **never auto-populate** them.
#### Step 1: Build Lookup Payload with Deduplication
Load the dataset from: [./run/data/report-data.json](./run/data/report-data.json)
- Read the `Entries` array. Each entry will be used to build the MCP lookup payload.
Reduce server requests by deduplicating identical entries:
- For each entry in `Entries`, compute a content hash (hash of `CountryCode` + `RegionCode` + `City`).
- Create a deduplication map: `{ contentHash -> { rowKey, payload, entryIndices: [] } }`. rowKey is a UUID that will be sent to the MCP server for matching responses.
- If an entry's hash already exists, append its **0-based array index** in `Entries` to that deduplication entry's `entryIndices` array.
- If hash is new, generate a **UUID (rowKey)** and create a new deduplication entry.
Build request batches:
- Extract unique deduplicated entries from the map, keeping them in deduplication order.
- Build request batches of up to 1000 items each.
- For each batch, keep an in-memory structure like `[{ rowKey, payload, entryIndices }, ...]` to match responses back by rowKey.
- When writing the MCP payload file, include the `rowKey` field with each payload object:
```json
[
{"rowKey": "550e8400-e29b-41d4-a716-446655440000", "countryCode":"CA","regionCode":"CA-ON","cityName":"Toronto"},
{"rowKey": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA","cityName":"Bangalore"},
{"rowKey": "6ba7b811-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA"}
]
```
- When reading responses, match each response `rowKey` field to the corresponding deduplication entry to retrieve all associated `entryIndices`.
Rules:
- Write payload to: [./run/data/mcp-server-payload.json](./run/data/mcp-server-payload.json)
- Exit the script after writing the payload.
#### Step 2: Invoke Fastah MCP Tool
- An example `mcp.json` style configuration of Fastah MCP server is as follows:
```json
"fastah-ip-geofeed": {
"type": "http",
"url": "https://mcp.fastah.ai/mcp"
}
```
- Server: `https://mcp.fastah.ai/mcp`
- Tool and its Schema: before the first `tools/call`, the agent MUST send a `tools/list` request to read the input and output schema for **`rfc8805-row-place-search`**.
Use the discovered schema as the authoritative source for field names, types, and constraints.
- The following is an illustrative example only; always defer to the schema returned by `tools/list`:
```json
[
{"rowKey": "550e8400-...", "countryCode":"CA", ...},
{"rowKey": "690e9301-...", "countryCode":"ZZ", ...}
]
- Open [./run/data/mcp-server-payload.json](./run/data/mcp-server-payload.json) and send all deduplicated entries with their rowKeys.
- If there are more than 1000 deduplicated entries after deduplication, split into multiple requests of 1000 entries each.
- The server will respond with the same `rowKey` field in each response for mapping back.
- Do NOT use local data.
#### Step 3: Attach Tuned Data to Entries
- Generate a new **script** for attaching tuned data.
- Load both [./run/data/report-data.json](./run/data/report-data.json) and the deduplication map (held in memory from Step 1, or re-derived from the payload file).
- For each response from the MCP server:
- Extract the `rowKey` from the response.
- Look up the `entryIndices` array associated with that `rowKey` from the deduplication map.
- For each index in `entryIndices`, attach the best match to `Entries[index]`.
- Use the **first (best) match** from the response when available.
Create the field on each affected entry if it does not exist. Remap the MCP API response keys to Go struct field names:
```json
"TunedEntry": {
"Name": "",
"CountryCode": "",
"RegionCode": "",
"PlaceType": "",
"H3Cells": [],
"BoundingBox": []
}
```
The `TunedEntry` field is a **single object** (not an array). It holds the best match from the MCP server.
**MCP response key → JSON key mapping**:
| MCP API response key | JSON key |
|----------------------|----------------------------|
| `placeName` | `Name` |
| `countryCode` | `CountryCode` |
| `stateCode` | `RegionCode` |
| `placeType` | `PlaceType` |
| `h3Cells` | `H3Cells` |
| `boundingBox` | `BoundingBox` |
Entries with no UUID match (i.e. the MCP server returned no response for their UUID) must receive an empty `TunedEntry: {}` object — never leave the field absent.
- Write the dataset back to: [./run/data/report-data.json](./run/data/report-data.json)
- Rules:
- Maintain all existing validation flags.
- Do NOT create additional intermediate files.
### Phase 5: Generate Tuning Report
Generate a **self-contained HTML report** by rendering the template at `./scripts/templates/index.html` with data from `./run/data/report-data.json` and `./run/data/comments.json`.
Write the completed report to `./run/report/geofeed-report.html`. After generating, attempt to open it in the system's default browser (e.g., `webbrowser.open()`). If running in a headless environment, CI pipeline, or remote container where no browser is available, skip the browser step and instead present the file path to the user so they can open or download it.
**The template uses Go `html/template` syntax** (`{{.Field}}`, `{{range}}`, `{{if eq}}`, etc.). Write a Python script that reads the template, builds a rendering context from the JSON data files, and processes the template placeholders to produce final HTML. Do not modify the template file itself — all processing happens in the Python script at render time.
#### Step 1: Replace Metadata Placeholders
Replace each `{{.Metadata.X}}` placeholder in the template with the corresponding value from `report-data.json`. Since JSON keys match the template placeholder, the mapping is direct — `{{.Metadata.InputFile}}` maps to the `InputFile` JSON key, etc.
| Template placeholder | JSON key (`report-data.json`) |
|----------------------------------------|-----------------------------------|
| `{{.Metadata.InputFile}}` | `InputFile` |
| `{{.Metadata.Timestamp}}` | `Timestamp` |
| `{{.Metadata.TotalEntries}}` | `TotalEntries` |
| `{{.Metadata.IpV4Entries}}` | `IpV4Entries` |
| `{{.Metadata.IpV6Entries}}` | `IpV6Entries` |
| `{{.Metadata.InvalidEntries}}` | `InvalidEntries` |
| `{{.Metadata.Errors}}` | `Errors` |
| `{{.Metadata.Warnings}}` | `Warnings` |
| `{{.Metadata.Suggestions}}` | `Suggestions` |
| `{{.Metadata.OK}}` | `OK` |
| `{{.Metadata.CityLevelAccuracy}}` | `CityLevelAccuracy` |
| `{{.Metadata.RegionLevelAccuracy}}` | `RegionLevelAccuracy` |
| `{{.Metadata.CountryLevelAccuracy}}` | `CountryLevelAccuracy` |
| `{{.Metadata.DoNotGeolocate}}` | `DoNotGeolocate` (metadata) |
**Note on `{{.Metadata.Timestamp}}`:** This placeholder appears inside a JavaScript `new Date(...)` call. Replace it with the raw integer value (no HTML escaping needed for a numeric literal inside `<script>`). All other metadata values should be HTML-escaped since they appear inside HTML element text.
#### Step 2: Replace the Comment Map Placeholder
Locate this pattern in the template:
```javascript
const commentMap = {{.Comments}};
```
Replace `{{.Comments}}` with the serialized JSON object from `./run/data/comments.json`. The JSON is embedded directly as a JavaScript object literal (not inside a string), so no extra escaping is needed:
```python
comments_json = json.dumps(comments)
template = template.replace("{{.Comments}}", comments_json)
```
#### Step 3: Expand the Entries Range Block
The template contains a `{{range .Entries}}...{{end}}` block inside `<tbody id="entriesTableBody">`. Process it as follows:
1. **Extract** the range block body using regex. **Critical:** The block contains nested `{{end}}` tags (from `{{if eq .Status ...}}`, `{{if .Checked}}`, and `{{range .Messages}}`). A naive non-greedy match like `\{\{range \.Entries\}\}(.*?)\{\{end\}\}` will match the **first** inner `{{end}}`, truncating the block. Instead, anchor the outer `{{end}}` to the `</tbody>` that follows it:
```python
m = re.search(
r'\{\{range \.Entries\}\}(.*?)\{\{end\}\}\s*</tbody>',
template,
re.DOTALL,
)
entry_body = m.group(1) # template text for one entry iteration
```
This ensures you capture the full block body including all three `<tr>` rows and the nested `{{range .Messages}}...{{end}}`.
2. **Iterate** over each entry in `report-data.json`'s `Entries` array.
3. **Expand** the block body for each entry using the processing order below.
4. **Replace** the entire match (from `{{range .Entries}}` through `</tbody>`) with the concatenated expanded HTML followed by `</tbody>`.
**Processing order for each entry** (innermost constructs first to avoid `{{end}}` confusion):
1. Evaluate `{{if eq .Status ...}}...{{end}}` conditionals (status badge class and icon).
2. Evaluate `{{if .Checked}}...{{end}}` conditional (message checkbox).
3. Expand `{{range .Messages}}...{{end}}` inner range.
4. Replace simple `{{.Field}}` placeholders.
##### Entry Field Mapping
Within the range block body, replace these placeholders for each entry. Since JSON keys match the template placeholder, the template placeholder `{{.X}}` maps directly to JSON key `X`:
| Template placeholder | JSON key (`Entries[]`) | Notes |
|--------------------------------|------------------------------|--------------------------------------------------------------|
| `{{.Line}}` | `Line` | Direct integer value |
| `{{.IPPrefix}}` | `IPPrefix` | HTML-escaped |
| `{{.CountryCode}}` | `CountryCode` | HTML-escaped |
| `{{.RegionCode}}` | `RegionCode` | HTML-escaped |
| `{{.City}}` | `City` | HTML-escaped |
| `{{.Status}}` | `Status` | HTML-escaped |
| `{{.HasError}}` | `HasError` | Lowercase string: `"true"` or `"false"` |
| `{{.HasWarning}}` | `HasWarning` | Lowercase string: `"true"` or `"false"` |
| `{{.HasSuggestion}}` | `HasSuggestion` | Lowercase string: `"true"` or `"false"` |
| `{{.GeocodingHint}}` | `GeocodingHint` | Empty string `""` |
| `{{.DoNotGeolocate}}` | `DoNotGeolocate` | `"true"` or `"false"` |
| `{{.Tunable}}` | `Tunable` | `"true"` or `"false"` |
| `{{.TunedEntry.CountryCode}}` | `TunedEntry.CountryCode` | `""` if `TunedEntry` is empty `{}` |
| `{{.TunedEntry.RegionCode}}` | `TunedEntry.RegionCode` | `""` if `TunedEntry` is empty `{}` |
| `{{.TunedEntry.Name}}` | `TunedEntry.Name` | `""` if `TunedEntry` is empty `{}` |
| `{{.TunedEntry.H3Cells}}` | `TunedEntry.H3Cells` | Bracket-wrapped space-separated; `"[]"` if empty (see format below) |
| `{{.TunedEntry.BoundingBox}}` | `TunedEntry.BoundingBox` | Bracket-wrapped space-separated; `"[]"` if empty (see format below) |
**`data-h3-cells` and `data-bounding-box` format:** These are **NOT JSON arrays**. They are bracket-wrapped, space-separated values. Do **not** use JSON serialization (no quotes around string elements, no commas between numbers). Examples:
- `[836752fffffffff 836755fffffffff]` — correct
- `["836752fffffffff","836755fffffffff"]` — **WRONG**, quotes will break parsing
- `[-71.70 10.73 -71.52 10.55]` — correct
- `[]` — correct for empty
##### Evaluating Status Conditionals
**Process these BEFORE replacing simple `{{.Field}}` placeholders** — otherwise the `{{end}}` markers get consumed and the regex won't match.
The template uses `{{if eq .Status "..."}}` conditionals for the status badge CSS class and icon. Evaluate these by checking the entry's `status` value and keeping only the matching branch text.
The status badge line contains **two** `{{if eq .Status ...}}...{{end}}` blocks on a single line — one for the CSS class, one for the icon. Use `re.sub` with a callback to resolve all occurrences:
```python
STATUS_CSS = {"ERROR": "error", "WARNING": "warning", "SUGGESTION": "suggestion", "OK": "ok"}
STATUS_ICON = {
"ERROR": "bi-x-circle-fill",
"WARNING": "bi-exclamation-triangle-fill",
"SUGGESTION": "bi-lightbulb-fill",
"OK": "bi-check-circle-fill",
}
def resolve_status_if(match_obj, status):
"""Pick the branch matching `status` from a {{if eq .Status ...}}...{{end}} block."""
block = match_obj.group(0)
# Try each branch: {{if eq .Status "X"}}val{{else if ...}}val{{else}}val{{end}}
for st, val in [("ERROR",), ("WARNING",), ("SUGGESTION",)]:
# not needed to parse generically — just map from the known patterns
...
```
A simpler approach: since there are exactly two known patterns, replace them as literal strings:
```python
css_class = STATUS_CSS.get(status, "ok")
icon_class = STATUS_ICON.get(status, "bi-check-circle-fill")
body = body.replace(
'{{if eq .Status "ERROR"}}error{{else if eq .Status "WARNING"}}warning{{else if eq .Status "SUGGESTION"}}suggestion{{else}}ok{{end}}',
css_class,
)
body = body.replace(
'{{if eq .Status "ERROR"}}bi-x-circle-fill{{else if eq .Status "WARNING"}}bi-exclamation-triangle-fill{{else if eq .Status "SUGGESTION"}}bi-lightbulb-fill{{else}}bi-check-circle-fill{{end}}',
icon_class,
)
```
This avoids regex entirely and is safe because these exact strings appear verbatim in the template.
#### Step 4: Expand the Nested Messages Range
The `{{range .Messages}}...{{end}}` block contains a **nested** `{{if .Checked}} checked{{else}} disabled{{end}}` conditional, so its inner `{{end}}` would cause a simple non-greedy regex to match too early. Anchor the regex to `</td>` (the tag immediately after the messages range closing `{{end}}`) to capture the full block body:
```python
msg_match = re.search(
r'\{\{range \.Messages\}\}(.*?)\{\{end\}\}\s*(?=</td>)',
body, re.DOTALL
)
```
The lookahead `(?=</td>)` ensures the regex skips past the checkbox conditional's `{{end}}` (which is followed by `>`, not `</td>`) and matches only the range-closing `{{end}}` (which is followed by whitespace then `</td>`).
For each message in the entry's `Messages` array, clone the captured block body and expand it:
1. **Resolve the checkbox conditional** per message (must happen before simple placeholder replacement to remove the nested `{{end}}`):
```python
if msg.get("Checked"):
msg_body = msg_body.replace(
'{{if .Checked}} checked{{else}} disabled{{end}}', ' checked'
)
else:
msg_body = msg_body.replace(
'{{if .Checked}} checked{{else}} disabled{{end}}', ' disabled'
)
```
2. **Replace message field placeholders**:
| Template placeholder | Source | Notes |
|--------------------------|-----------------------------------|--------------------------------|
| `{{.ID}}` | `Messages[i].ID` | Direct string value from JSON |
| `{{.Text}}` | `Messages[i].Text` | HTML-escaped |
3. **Concatenate** all expanded message blocks and replace the original `{{range .Messages}}...{{end}}` match (`msg_match.group(0)`) with the result:
```python
body = body[:msg_match.start()] + "".join(expanded_msgs) + body[msg_match.end():]
```
If `Messages` is empty, replace the entire matched region with an empty string (no message divs — only the issues header remains).
#### Output Guarantees
- The report must be readable in any modern browser without extra network dependencies beyond the CDN links already in the template (`leaflet`, `h3-js`, `bootstrap-icons`, Raleway font).
- All values embedded in HTML must be **HTML-escaped** (`<`, `>`, `&`, `"`) to prevent rendering issues.
- `commentMap` is embedded as a direct JavaScript object literal (not inside a string), so no JS string escaping is needed — just emit valid JSON.
- All values must be derived **only from analysis output**, not recomputed heuristically.
### Phase 6: Final Review
Perform a final verification pass using concrete, checkable assertions before presenting results to the user.
**Check 1 — Entry count integrity**
- Count non-comment, non-blank data rows in the original input CSV.
- Assert: `len(entries) in report-data.json == data_row_count`
- On failure: `Row count mismatch: input has {N} data rows but report contains {M} entries.`
**Check 2 — Summary counter integrity**
- These counters use **mutual exclusion** based on the boolean flags, which mirrors the highest-severity `Status` field. An entry with both `HasError: true` and `HasWarning: true` is counted only in `Errors`, never in `Warnings`. This is equivalent to counting by the entry's `Status` field.
- Assert all of the following; correct any that fail before generating the report:
- `Errors == sum(1 for e in Entries if e['HasError'])`
- `Warnings == sum(1 for e in Entries if e['HasWarning'] and not e['HasError'])`
- `Suggestions == sum(1 for e in Entries if e['HasSuggestion'] and not e['HasError'] and not e['HasWarning'])`
- `OK == sum(1 for e in Entries if not e['HasError'] and not e['HasWarning'] and not e['HasSuggestion'])`
- `Errors + Warnings + Suggestions + OK == TotalEntries - InvalidEntries`
**Check 3 — Accuracy bucket integrity**
- Assert: `CityLevelAccuracy + RegionLevelAccuracy + CountryLevelAccuracy + DoNotGeolocate == TotalEntries - InvalidEntries`
- **Note:** The accuracy buckets defined in Phase 3 say "Do not count entries with `HasError: true`", but the Check 3 formula above uses `TotalEntries - InvalidEntries` (which still includes ERROR entries). This means ERROR entries (those that parsed as valid IPs but failed validation) **are** counted in accuracy buckets by their geo-field presence. Only `InvalidEntries` (unparsable IP prefixes) are excluded. Follow the Check 3 formula as the authoritative rule.
- On failure, trace and fix the bucketing logic before proceeding.
**Check 4 — No duplicate line numbers**
- Assert: all `Line` values in `Entries` are unique.
- On failure, report the duplicated line numbers to the user.
**Check 5 — TunedEntry completeness**
- Assert: every object in `Entries` has a `TunedEntry` key (even if its value is `{}`).
- On failure, add `"TunedEntry": {}` to any entry missing the key, then re-save `report-data.json`.
**Check 6 — Report file is present and non-empty**
- Confirm `./run/report/geofeed-report.html` was written and has a file size greater than zero bytes.
- On failure, regenerate the report before presenting to the user.

View File

@@ -0,0 +1,5 @@
202.125.100.144/28,ID,,Jakarta,
2605:59c8:2700::/40,CA,CA-QC,"Montreal"
150.228.170.0/24,SG,SG-01,Singapore,
# It's OK for small city states to leave state ISO2 code unspecified
2406:2d40:8100::/42,SG,SG,Singapore,
1 202.125.100.144/28 ID Jakarta
2 2605:59c8:2700::/40 CA CA-QC Montreal
3 150.228.170.0/24 SG SG-01 Singapore
4 # It's OK for small city states to leave state ISO2 code unspecified
5 2406:2d40:8100::/42 SG SG Singapore

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,106 @@
[
"AD",
"AG",
"AI",
"AN",
"AQ",
"AS",
"AW",
"AX",
"BB",
"BH",
"BL",
"BM",
"BN",
"BQ",
"BS",
"BT",
"BV",
"BZ",
"CC",
"CK",
"CS",
"CV",
"CW",
"CX",
"CY",
"DM",
"EH",
"FK",
"FM",
"FO",
"GD",
"GF",
"GG",
"GI",
"GL",
"GM",
"GP",
"GS",
"GU",
"GY",
"HK",
"HM",
"IM",
"IO",
"IS",
"JE",
"JM",
"KI",
"KM",
"KN",
"KY",
"LB",
"LC",
"LI",
"LU",
"MC",
"ME",
"MF",
"MH",
"MO",
"MP",
"MQ",
"MS",
"MT",
"MU",
"MV",
"NC",
"NF",
"NR",
"NU",
"PF",
"PM",
"PN",
"PR",
"PS",
"PW",
"QA",
"RE",
"SB",
"SC",
"SG",
"SH",
"SJ",
"SM",
"SR",
"ST",
"SX",
"TC",
"TF",
"TK",
"TL",
"TO",
"TT",
"TV",
"UM",
"VA",
"VC",
"VG",
"VI",
"VU",
"WF",
"WS",
"XK",
"YT"
]

View File

@@ -0,0 +1,735 @@
Independent Submission E. Kline
Request for Comments: 8805 Loon LLC
Category: Informational K. Duleba
ISSN: 2070-1721 Google
Z. Szamonek
S. Moser
Google Switzerland GmbH
W. Kumari
Google
August 2020
A Format for Self-Published IP Geolocation Feeds
Abstract
This document records a format whereby a network operator can publish
a mapping of IP address prefixes to simplified geolocation
information, colloquially termed a "geolocation feed". Interested
parties can poll and parse these feeds to update or merge with other
geolocation data sources and procedures. This format intentionally
only allows specifying coarse-level location.
Some technical organizations operating networks that move from one
conference location to the next have already experimentally published
small geolocation feeds.
This document describes a currently deployed format. At least one
consumer (Google) has incorporated these feeds into a geolocation
data pipeline, and a significant number of ISPs are using it to
inform them where their prefixes should be geolocated.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This is a contribution to the RFC Series, independently of any other
RFC stream. The RFC Editor has chosen to publish this document at
its discretion and makes no statement about its value for
implementation or deployment. Documents approved for publication by
the RFC Editor are not candidates for any level of Internet Standard;
see Section 2 of RFC 7841.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8805.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Table of Contents
1. Introduction
1.1. Motivation
1.2. Requirements Notation
1.3. Assumptions about Publication
2. Self-Published IP Geolocation Feeds
2.1. Specification
2.1.1. Geolocation Feed Individual Entry Fields
2.1.1.1. IP Prefix
2.1.1.2. Alpha2code (Previously: 'country')
2.1.1.3. Region
2.1.1.4. City
2.1.1.5. Postal Code
2.1.2. Prefixes with No Geolocation Information
2.1.3. Additional Parsing Requirements
2.2. Examples
3. Consuming Self-Published IP Geolocation Feeds
3.1. Feed Integrity
3.2. Verification of Authority
3.3. Verification of Accuracy
3.4. Refreshing Feed Information
4. Privacy Considerations
5. Relation to Other Work
6. Security Considerations
7. Planned Future Work
8. Finding Self-Published IP Geolocation Feeds
8.1. Ad Hoc 'Well-Known' URIs
8.2. Other Mechanisms
9. IANA Considerations
10. References
10.1. Normative References
10.2. Informative References
Appendix A. Sample Python Validation Code
Acknowledgements
Authors' Addresses
1. Introduction
1.1. Motivation
Providers of services over the Internet have grown to depend on best-
effort geolocation information to improve the user experience.
Locality information can aid in directing traffic to the nearest
serving location, inferring likely native language, and providing
additional context for services involving search queries.
When an ISP, for example, changes the location where an IP prefix is
deployed, services that make use of geolocation information may begin
to suffer degraded performance. This can lead to customer
complaints, possibly to the ISP directly. Dissemination of correct
geolocation data is complicated by the lack of any centralized means
to coordinate and communicate geolocation information to all
interested consumers of the data.
This document records a format whereby a network operator (an ISP, an
enterprise, or any organization that deems the geolocation of its IP
prefixes to be of concern) can publish a mapping of IP address
prefixes to simplified geolocation information, colloquially termed a
"geolocation feed". Interested parties can poll and parse these
feeds to update or merge with other geolocation data sources and
procedures.
This document describes a currently deployed format. At least one
consumer (Google) has incorporated these feeds into a geolocation
data pipeline, and a significant number of ISPs are using it to
inform them where their prefixes should be geolocated.
1.2. Requirements Notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
As this is an informational document about a data format and set of
operational practices presently in use, requirements notation
captures the design goals of the authors and implementors.
1.3. Assumptions about Publication
This document describes both a format and a mechanism for publishing
data, with the assumption that the network operator to whom
operational responsibility has been delegated for any published data
wishes it to be public. Any privacy risk is bounded by the format,
and feed publishers MAY omit prefixes or any location field
associated with a given prefix to further protect privacy (see
Section 2.1 for details about which fields exactly may be omitted).
Feed publishers assume the responsibility of determining which data
should be made public.
This document does not incorporate a mechanism to communicate
acceptable use policies for self-published data. Publication itself
is inferred as a desire by the publisher for the data to be usefully
consumed, similar to the publication of information like host names,
cryptographic keys, and Sender Policy Framework (SPF) records
[RFC7208] in the DNS.
2. Self-Published IP Geolocation Feeds
The format described here was developed to address the need of
network operators to rapidly and usefully share geolocation
information changes. Originally, there arose a specific case where
regional operators found it desirable to publish location changes
rather than wait for geolocation algorithms to "learn" about them.
Later, technical conferences that frequently use the same network
prefixes advertised from different conference locations experimented
by publishing geolocation feeds updated in advance of network
location changes in order to better serve conference attendees.
At its simplest, the mechanism consists of a network operator
publishing a file (the "geolocation feed") that contains several text
entries, one per line. Each entry is keyed by a unique (within the
feed) IP prefix (or single IP address) followed by a sequence of
network locality attributes to be ascribed to the given prefix.
2.1. Specification
For operational simplicity, every feed should contain data about all
IP addresses the provider wants to publish. Alternatives, like
publishing only entries for IP addresses whose geolocation data has
changed or differ from current observed geolocation behavior "at
large", are likely to be too operationally complex.
Feeds MUST use UTF-8 [RFC3629] character encoding. Lines are
delimited by a line break (CRLF) (as specified in [RFC4180]), and
blank lines are ignored. Text from a '#' character to the end of the
current line is treated as a comment only and is similarly ignored
(note that this does not strictly follow [RFC4180], which has no
support for comments).
Feed lines that are not comments MUST be formatted as comma-separated
values (CSV), as described in [RFC4180]. Each feed entry is a text
line of the form:
ip_prefix,alpha2code,region,city,postal_code
The IP prefix field is REQUIRED, all others are OPTIONAL (can be
empty), though the requisite minimum number of commas SHOULD be
present.
2.1.1. Geolocation Feed Individual Entry Fields
2.1.1.1. IP Prefix
REQUIRED: Each IP prefix field MUST be either a single IP address or
an IP prefix in Classless Inter-Domain Routing (CIDR) notation in
conformance with Section 3.1 of [RFC4632] for IPv4 or Section 2.3 of
[RFC4291] for IPv6.
Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and
"2001:db8::1" and "2001:db8::/32" for IPv6.
2.1.1.2. Alpha2code (Previously: 'country')
OPTIONAL: The alpha2code field, if non-empty, MUST be a 2-letter ISO
country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2].
Parsers SHOULD treat this field case-insensitively.
Earlier versions of this document called this field "country", and it
may still be referred to as such in existing tools/interfaces.
Parsers MAY additionally support other 2-letter codes outside the ISO
3166-1 alpha 2 codes, such as the 2-letter codes from the
"Exceptionally reserved codes" [ISO-GLOSSARY] set.
Examples include "US" for the United States, "JP" for Japan, and "PL"
for Poland.
2.1.1.3. Region
OPTIONAL: The region field, if non-empty, MUST be an ISO region code
conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this
field case-insensitively.
Examples include "ID-RI" for the Riau province of Indonesia and "NG-
RI" for the Rivers province in Nigeria.
2.1.1.4. City
OPTIONAL: The city field, if non-empty, SHOULD be free UTF-8 text,
excluding the comma (',') character.
Examples include "Dublin", "New York", and "Sao Paulo" (specifically
"S" followed by 0xc3, 0xa3, and "o Paulo").
2.1.1.5. Postal Code
OPTIONAL, DEPRECATED: The postal code field, if non-empty, SHOULD be
free UTF-8 text, excluding the comma (',') character. The use of
this field is deprecated; consumers of feeds should be able to parse
feeds containing these fields, but new feeds SHOULD NOT include this
field due to the granularity of this information. See Section 4 for
additional discussion.
Examples include "106-6126" (in Minato ward, Tokyo, Japan).
2.1.2. Prefixes with No Geolocation Information
Feed publishers may indicate that some IP prefixes should not have
any associated geolocation information. It may be that some prefixes
under their administrative control are reserved, not yet allocated or
deployed, or in the process of being redeployed elsewhere and
existing geolocation information can, from the perspective of the
publisher, safely be discarded.
This special case can be indicated by explicitly leaving blank all
fields that specify any degree of geolocation information. For
example:
192.0.2.0/24,,,,
2001:db8:1::/48,,,,
2001:db8:2::/48,,,,
Historically, the user-assigned alpha2code identifier of "ZZ" has
been used for this same purpose. This is not necessarily preferred,
and no specific interpretation of any of the other user-assigned
alpha2code codes is currently defined.
2.1.3. Additional Parsing Requirements
Feed entries that do not have an IP address or prefix field or have
an IP address or prefix field that fails to parse correctly MUST be
discarded.
While publishers SHOULD follow [RFC5952] for IPv6 prefix fields,
consumers MUST nevertheless accept all valid string representations.
Duplicate IP address or prefix entries MUST be considered an error,
and consumer implementations SHOULD log the repeated entries for
further administrative review. Publishers SHOULD take measures to
ensure there is one and only one entry per IP address and prefix.
Multiple entries that constitute nested prefixes are permitted.
Consumers SHOULD consider the entry with the longest matching prefix
(i.e., the "most specific") to be the best matching entry for a given
IP address.
Feed entries with non-empty optional fields that fail to parse,
either in part or in full, SHOULD be discarded. It is RECOMMENDED
that they also be logged for further administrative review.
For compatibility with future additional fields, a parser MUST ignore
any fields beyond those it expects. The data from fields that are
expected and that parse successfully MUST still be considered valid.
Per Section 7, no extensions to this format are in use nor are any
anticipated.
2.2. Examples
Example entries using different IP address formats and describing
locations at alpha2code ("country code"), region, and city
granularity level, respectively:
192.0.2.0/25,US,US-AL,,
192.0.2.5,US,US-AL,Alabaster,
192.0.2.128/25,PL,PL-MZ,,
2001:db8::/32,PL,,,
2001:db8:cafe::/48,PL,PL-MZ,,
The IETF network publishes geolocation information for the meeting
prefixes, and generally just comment out the last meeting information
and append the new meeting information. The [GEO_IETF], at the time
of this writing, contains:
# IETF106 (Singapore) - November 2019 - Singapore, SG
130.129.0.0/16,SG,SG-01,Singapore,
2001:df8::/32,SG,SG-01,Singapore,
31.133.128.0/18,SG,SG-01,Singapore,
31.130.224.0/20,SG,SG-01,Singapore,
2001:67c:1230::/46,SG,SG-01,Singapore,
2001:67c:370::/48,SG,SG-01,Singapore,
Experimentally, RIPE has published geolocation information for their
conference network prefixes, which change location in accordance with
each new event. [GEO_RIPE_NCC], at the time of writing, contains:
193.0.24.0/21,NL,NL-ZH,Rotterdam,
2001:67c:64::/48,NL,NL-ZH,Rotterdam,
Similarly, ICANN has published geolocation information for their
portable conference network prefixes. [GEO_ICANN], at the time of
writing, contains:
199.91.192.0/21,MA,MA-07,Marrakech
2620:f:8000::/48,MA,MA-07,Marrakech
A longer example is the [GEO_Google] Google Corp Geofeed, which lists
the geolocation information for Google corporate offices.
At the time of writing, Google processes approximately 400 feeds
comprising more than 750,000 IPv4 and IPv6 prefixes.
3. Consuming Self-Published IP Geolocation Feeds
Consumers MAY treat published feed data as a hint only and MAY choose
to prefer other sources of geolocation information for any given IP
prefix. Regardless of a consumer's stance with respect to a given
published feed, there are some points of note for sensibly and
effectively consuming published feeds.
3.1. Feed Integrity
The integrity of published information SHOULD be protected by
securing the means of publication, for example, by using HTTP over
TLS [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving
geolocation feeds in a manner that guarantees integrity of the feed.
3.2. Verification of Authority
Consumers of self-published IP geolocation feeds SHOULD perform some
form of verification that the publisher is in fact authoritative for
the addresses in the feed. The actual means of verification is
likely dependent upon the way in which the feed is discovered. Ad
hoc shared URIs, for example, will likely require an ad hoc
verification process. Future automated means of feed discovery
SHOULD have an accompanying automated means of verification.
A consumer should only trust geolocation information for IP addresses
or prefixes for which the publisher has been verified as
administratively authoritative. All other geolocation feed entries
should be ignored and logged for further administrative review.
3.3. Verification of Accuracy
Errors and inaccuracies may occur at many levels, and publication and
consumption of geolocation data are no exceptions. To the extent
practical, consumers SHOULD take steps to verify the accuracy of
published locality. Verification methodology, resolution of
discrepancies, and preference for alternative sources of data are
left to the discretion of the feed consumer.
Consumers SHOULD decide on discrepancy thresholds and SHOULD flag,
for administrative review, feed entries that exceed set thresholds.
3.4. Refreshing Feed Information
As a publisher can change geolocation data at any time and without
notification, consumers SHOULD implement mechanisms to periodically
refresh local copies of feed data. In the absence of any other
refresh timing information, it is recommended that consumers SHOULD
refresh feeds no less often than weekly and no more often than is
likely to cause issues to the publisher.
For feeds available via HTTPS (or HTTP), the publisher MAY
communicate refresh timing information by means of the standard HTTP
expiration model ([RFC7234]). Specifically, publishers can include
either an Expires header (Section 5.3 of [RFC7234]) or a Cache-
Control header (Section 5.2 of [RFC7234]) specifying the max-age.
Where practical, consumers SHOULD refresh feed information before the
expiry time is reached.
4. Privacy Considerations
Publishers of geolocation feeds are advised to have fully considered
any and all privacy implications of the disclosure of such
information for the users of the described networks prior to
publication. A thorough comprehension of the security considerations
(Section 13 of [RFC6772]) of a chosen geolocation policy is highly
recommended, including an understanding of some of the limitations of
information obscurity (Section 13.5 of [RFC6772]) (see also
[RFC6772]).
As noted in Section 2.1, each location field in an entry is optional,
in order to support expressing only the level of specificity that the
publisher has deemed acceptable. There is no requirement that the
level of specificity be consistent across all entries within a feed.
In particular, the Postal Code field (Section 2.1.1.5) can provide
very specific geolocation, sometimes within a building. Such
specific Postal Code values MUST NOT be published in geofeeds without
the express consent of the parties being located.
Operators who publish geolocation information are strongly encouraged
to inform affected users/customers of this fact and of the potential
privacy-related consequences and trade-offs.
5. Relation to Other Work
While not originally done in conjunction with the GEOPRIV Working
Group [GEOPRIV], Richard Barnes observed that this work is
nevertheless consistent with that which the group has defined, both
for address format and for privacy. The data elements in geolocation
feeds are equivalent to the following XML structure ([RFC5139]
[W3C.REC-xml-20081126]):
<civicAddress>
<country>country</country>
<A1>region</A1>
<A2>city</A2>
<PC>postal_code</PC>
</civicAddress>
Providing geolocation information to this granularity is equivalent
to the following privacy policy (the definition of the 'building'
Section 6.5.1 of [RFC6772] level of disclosure):
<ruleset>
<rule>
<conditions/>
<actions/>
<transformations>
<provide-location profile="civic-transformation">
<provide-civic>building</provide-civic>
</provide-location>
</transformations>
</rule>
</ruleset>
6. Security Considerations
As there is no true security in the obscurity of the location of any
given IP address, self-publication of this data fundamentally opens
no new attack vectors. For publishers, self-published data may
increase the ease with which such location data might be exploited
(it can, for example, make easy the discovery of prefixes populated
with customers as distinct from prefixes not generally in use).
For consumers, feed retrieval processes may receive input from
potentially hostile sources (e.g., in the event of hijacked traffic).
As such, proper input validation and defense measures MUST be taken
(see the discussion in Section 3.1).
Similarly, consumers who do not perform sufficient verification of
published data bear the same risks as from other forms of geolocation
configuration errors (see the discussion in Sections 3.2 and 3.3).
Validation of a feed's contents includes verifying that the publisher
is authoritative for the IP prefixes included in the feed. Failure
to verify IP prefix authority would, for example, allow ISP Bob to
make geolocation statements about IP space held by ISP Alice. At
this time, only out-of-band verification methods are implemented
(i.e., an ISP's feed may be verified against publicly available IP
allocation data).
7. Planned Future Work
In order to more flexibly support future extensions, use of a more
expressive feed format has been suggested. Use of JavaScript Object
Notation (JSON) [RFC8259], specifically, has been discussed.
However, at the time of writing, no such specification nor
implementation exists. Nevertheless, work on extensions is deferred
until a more suitable format has been selected.
The authors are planning on writing a document describing such a new
format. This document describes a currently deployed and used
format. Given the extremely limited extensibility of the present
format no extensions to it are anticipated. Extensibility
requirements are instead expected to be integral to the development
of a new format.
8. Finding Self-Published IP Geolocation Feeds
The issue of finding, and later verifying, geolocation feeds is not
formally specified in this document. At this time, only ad hoc feed
discovery and verification has a modicum of established practice (see
below); discussion of other mechanisms has been removed for clarity.
8.1. Ad Hoc 'Well-Known' URIs
To date, geolocation feeds have been shared informally in the form of
HTTPS URIs exchanged in email threads. Three example URIs
([GEO_IETF], [GEO_RIPE_NCC], and [GEO_ICANN]) describe networks that
change locations periodically, the operators and operational
practices of which are well known within their respective technical
communities.
The contents of the feeds are verified by a similarly ad hoc process,
including:
* personal knowledge of the parties involved in the exchange and
* comparison of feed-advertised prefixes with the BGP-advertised
prefixes of Autonomous System Numbers known to be operated by the
publishers.
Ad hoc mechanisms, while useful for early experimentation by
producers and consumers, are unlikely to be adequate for long-term,
widespread use by multiple parties. Future versions of any such
self-published geolocation feed mechanism SHOULD address scalability
concerns by defining a means for automated discovery and verification
of operational authority of advertised prefixes.
8.2. Other Mechanisms
Previous versions of this document referenced use of the WHOIS
service [RFC3912] operated by Regional Internet Registries (RIRs), as
well as possible DNS-based schemes to discover and validate geofeeds.
To the authors' knowledge, support for such mechanisms has never been
implemented, and this speculative text has been removed to avoid
ambiguity.
9. IANA Considerations
This document has no IANA actions.
10. References
10.1. Normative References
[ISO.3166.1alpha2]
ISO, "ISO 3166-1 decoding table",
<http://www.iso.org/iso/home/standards/country_codes/iso-
3166-1_decoding_table.htm>.
[ISO.3166.2]
ISO, "ISO 3166-2:2007",
<http://www.iso.org/iso/home/standards/
country_codes.htm#2012_iso3166-2>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2003, <https://www.rfc-editor.org/info/rfc3629>.
[RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma-
Separated Values (CSV) Files", RFC 4180,
DOI 10.17487/RFC4180, October 2005,
<https://www.rfc-editor.org/info/rfc4180>.
[RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing
Architecture", RFC 4291, DOI 10.17487/RFC4291, February
2006, <https://www.rfc-editor.org/info/rfc4291>.
[RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing
(CIDR): The Internet Address Assignment and Aggregation
Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August
2006, <https://www.rfc-editor.org/info/rfc4632>.
[RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
Address Text Representation", RFC 5952,
DOI 10.17487/RFC5952, August 2010,
<https://www.rfc-editor.org/info/rfc5952>.
[RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
RFC 7234, DOI 10.17487/RFC7234, June 2014,
<https://www.rfc-editor.org/info/rfc7234>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[W3C.REC-xml-20081126]
Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth
Edition)", World Wide Web Consortium Recommendation REC-
xml-20081126, November 2008,
<http://www.w3.org/TR/2008/REC-xml-20081126>.
10.2. Informative References
[GEOPRIV] IETF, "Geographic Location/Privacy (geopriv)",
<http://datatracker.ietf.org/wg/geopriv/>.
[GEO_Google]
Google, LLC, "Google Corp Geofeed",
<https://www.gstatic.com/geofeed/corp_external>.
[GEO_ICANN]
ICANN, "ICANN Meeting Geolocation Data",
<https://meeting-services.icann.org/geo/google.csv>.
[GEO_IETF] Kumari, W., "IETF Meeting Network Geolocation Data",
<https://noc.ietf.org/geo/google.csv>.
[GEO_RIPE_NCC]
Schepers, M., "RIPE NCC Meeting Geolocation Data",
<https://meetings.ripe.net/geo/google.csv>.
[IPADDR_PY]
Shields, M. and P. Moody, "Google's Python IP address
manipulation library",
<http://code.google.com/p/ipaddr-py/>.
[ISO-GLOSSARY]
ISO, "Glossary for ISO 3166",
<https://www.iso.org/glossary-for-iso-3166.html>.
[RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818,
DOI 10.17487/RFC2818, May 2000,
<https://www.rfc-editor.org/info/rfc2818>.
[RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912,
DOI 10.17487/RFC3912, September 2004,
<https://www.rfc-editor.org/info/rfc3912>.
[RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location
Format for Presence Information Data Format Location
Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139,
February 2008, <https://www.rfc-editor.org/info/rfc5139>.
[RFC6772] Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J.,
Polk, J., Morris, J., and M. Thomson, "Geolocation Policy:
A Document Format for Expressing Privacy Preferences for
Location Information", RFC 6772, DOI 10.17487/RFC6772,
January 2013, <https://www.rfc-editor.org/info/rfc6772>.
[RFC7208] Kitterman, S., "Sender Policy Framework (SPF) for
Authorizing Use of Domains in Email, Version 1", RFC 7208,
DOI 10.17487/RFC7208, April 2014,
<https://www.rfc-editor.org/info/rfc7208>.
[RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
Interchange Format", STD 90, RFC 8259,
DOI 10.17487/RFC8259, December 2017,
<https://www.rfc-editor.org/info/rfc8259>.
Appendix A. <CODE EXAMPLE HAS BEEN REMOVED TO AVOID CONFUSING AI AGENTS/LLM>
Acknowledgements
The authors would like to express their gratitude to reviewers and
early implementors, including but not limited to Mikael Abrahamsson,
Andrew Alston, Ray Bellis, John Bond, Alissa Cooper, Andras Erdei,
Stephen Farrell, Marco Hogewoning, Mike Joseph, Maciej Kuzniar,
George Michaelson, Menno Schepers, Justyna Sidorska, Pim van Pelt,
and Bjoern A. Zeeb.
In particular, Richard L. Barnes and Andy Newton contributed
substantial review, text, and advice.
Authors' Addresses
Erik Kline
Loon LLC
1600 Amphitheatre Parkway
Mountain View, CA 94043
United States of America
Email: ek@loon.com
Krzysztof Duleba
Google
1600 Amphitheatre Parkway
Mountain View, CA 94043
United States of America
Email: kduleba@google.com
Zoltan Szamonek
Google Switzerland GmbH
Brandschenkestrasse 110
CH-8002 Zürich
Switzerland
Email: zszami@google.com
Stefan Moser
Google Switzerland GmbH
Brandschenkestrasse 110
CH-8002 Zürich
Switzerland
Email: smoser@google.com
Warren Kumari
Google
1600 Amphitheatre Parkway
Mountain View, CA 94043
United States of America
Email: warren@kumari.net

View File

@@ -0,0 +1,85 @@
# Code examples for Python 3
- Use Python 3's built-in [`ipaddress` package](https://docs.python.org/3/library/ipaddress.html), with `strict=True` passed to constructors where available.
- Be intentional about IPv4 vs IPv6 address parsing—they are not the same for professional network engineers. Use the strongest type/class available.
- Remember that a subnet can contain a single host: use `/128` for IPv6 and `/32` for IPv4.
## IP address and subnet parsing
- Use the [convenience factory functions in `ipaddress`](https://docs.python.org/3/library/ipaddress.html#convenience-factory-functions).
The following `ipaddress.ip_address(textAddress)` examples parse text into `IPv4Address` and `IPv6Address` objects, respectively:
```python
ipaddress.ip_address('192.168.0.1')
ipaddress.ip_address('2001:db8::')
```
The following `ipaddress.ip_network(address, strict=True)` example parses a subnet string and returns an `IPv4Network` or `IPv6Network` object, failing on invalid input:
```python
ipaddress.ip_network('192.168.0.0/28', strict=True)
```
The following strict-mode call fails (correctly) with `ValueError: 192.168.0.1/30 has host bits set`. Ask the user to fix such errors; do not guess corrections:
```python
ipaddress.ip_network('192.168.0.1/30', strict=True)
```
- Use the strict-form parser [`ipaddress.ip_network(address, strict=True)`](https://docs.python.org/3/library/ipaddress.html#ipaddress.ip_network).
## Dictionary of IP subnets
Use a Python dictionary to track subnets and their associated geolocation properties. `IPv4Network`, `IPv6Network`, `IPv4Address`, and `IPv6Address` are all hashable and can be used as dictionary keys.
## Detecting non-public IP ranges
The SKILL.md references `is_private` for detecting non-public ranges. Use the network's properties:
```python
import ipaddress
def is_non_public(network):
"""Check if a network is non-public (private, loopback, link-local, multicast, or reserved).
Note: In Python < 3.11, is_private may incorrectly flag some ranges
(e.g., 100.64.0.0/10 CGNAT space). Use is_global as the primary check
when available, with fallbacks for edge cases.
"""
addr = network.network_address
return (
network.is_private
or network.is_loopback
or network.is_link_local
or network.is_multicast
or network.is_reserved
or not network.is_global # catches most non-routable space
)
```
**Caution with `is_private` on Python < 3.11:** The `100.64.0.0/10` (Carrier-Grade NAT) range returns `is_private=True` but `is_global=False` in older Python versions. Since CGNAT space is not globally routable, flagging it as non-public is correct for RFC 8805 purposes.
## ISO 3166-1 country code validation
Read the valid ISO 2-letter country codes from [assets/iso3166-1.json](../assets/iso3166-1.json), specifically the `alpha_2` attribute:
```python
import json
with open('assets/iso3166-1.json') as f:
data = json.load(f)
valid_countries = {c['alpha_2'] for c in data['3166-1']}
```
## ISO 3166-2 region code validation
Read the valid region codes from [assets/iso3166-2.json](../assets/iso3166-2.json), specifically the `code` attribute. The top-level key is `3166-2` (matching the iso3166-1 pattern):
```python
import json
with open('assets/iso3166-2.json') as f:
data = json.load(f)
valid_regions = {r['code'] for r in data['3166-2']}
```

File diff suppressed because it is too large Load Diff