mirror of
https://github.com/github/awesome-copilot.git
synced 2026-05-05 22:52:11 +00:00
746ba555b6
* add mini-context-graph skill * remove pycache files * filename case update to SKILL.md * update readme
164 lines
4.3 KiB
Markdown
164 lines
4.3 KiB
Markdown
# Retrieval Instructions
|
|
|
|
This file defines how the agent answers queries using the two-layer retrieval strategy:
|
|
**wiki-first** (fast path), then **graph traversal with evidence** (deep path).
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Retrieval is a 7-step process:
|
|
|
|
1. Parse the query
|
|
2. **Check the wiki first** (fast path)
|
|
3. Find seed nodes in the graph
|
|
4. Expand the graph via BFS
|
|
5. Prune noisy nodes
|
|
6. Build the subgraph with provenance
|
|
7. Return structured context
|
|
|
|
---
|
|
|
|
## Step 1: Parse the Query
|
|
|
|
Read the query string and identify:
|
|
- **Key noun phrases**: potential entity names (e.g., "system crash", "memory leak")
|
|
- **Keywords**: individual meaningful words (e.g., "crash", "leak", "memory")
|
|
- Normalize all terms to **lowercase**
|
|
|
|
Ignore stopwords (e.g., "the", "a", "is", "why", "does", "how", "what").
|
|
|
|
---
|
|
|
|
## Step 2: Check the Wiki First (Fast Path)
|
|
|
|
Before touching the graph, search the wiki. The wiki contains compiled knowledge —
|
|
cross-references already resolved, contradictions flagged, syntheses written.
|
|
|
|
```python
|
|
from scripts.tools import wiki_store
|
|
|
|
results = wiki_store.search_wiki(query)
|
|
```
|
|
|
|
For each relevant result, read the page:
|
|
|
|
```python
|
|
content = wiki_store.read_page_by_slug(result["slug"])
|
|
```
|
|
|
|
**If the wiki has a sufficient answer:**
|
|
- Synthesize from wiki pages.
|
|
- Cite the source pages (e.g., "According to [[memory-leak]] and [[system-crash]]...").
|
|
- File the answer as a new wiki topic page if it's valuable and not already captured:
|
|
```python
|
|
wiki_store.write_page(category="topic", title="Why System Crashes", content=..., summary=...)
|
|
```
|
|
- **Return early** — no graph traversal needed.
|
|
|
|
**If the wiki answer is incomplete or missing:** proceed to Step 3.
|
|
|
|
---
|
|
|
|
## Step 3: Find Seed Nodes
|
|
|
|
Call `index_store.search(query)` with the original query string.
|
|
|
|
This returns node IDs matching entity names or keywords.
|
|
|
|
If no seed nodes are found:
|
|
- Try searching with individual keywords from Step 1.
|
|
- If still no results, return an empty subgraph: "No relevant entities found."
|
|
|
|
---
|
|
|
|
## Step 4: Expand the Graph (BFS)
|
|
|
|
Call `retrieval_engine.retrieve(seed_node_ids, depth=2)`.
|
|
|
|
BFS from seed nodes:
|
|
- **Depth 1**: direct neighbors
|
|
- **Depth 2**: neighbors of neighbors
|
|
|
|
Rules:
|
|
- Only traverse edges with confidence ≥ MIN_CONFIDENCE (from config.py)
|
|
- Do NOT traverse beyond depth 2
|
|
- Collect all visited node IDs
|
|
|
|
---
|
|
|
|
## Step 5: Prune Nodes
|
|
|
|
- Limit total nodes to MAX_NODES (from config.py)
|
|
- Prioritize:
|
|
1. Seed nodes (always include)
|
|
2. Nodes at depth 1
|
|
3. Nodes at depth 2 (as space allows)
|
|
- Remove nodes only weakly connected (edge confidence < MIN_CONFIDENCE)
|
|
|
|
---
|
|
|
|
## Step 6: Build the Subgraph with Provenance
|
|
|
|
For a standard query, call:
|
|
|
|
```python
|
|
subgraph = skill.query(query)
|
|
# Returns: {"nodes": {node_id: {name, type, source_document, source_chunks}},
|
|
# "edges": [{source, target, type, confidence, source_document, supporting_text, chunk_id}]}
|
|
```
|
|
|
|
For queries requiring evidence (citations, fact-checking), call:
|
|
|
|
```python
|
|
result = skill.query_with_evidence(query)
|
|
# Returns:
|
|
# {
|
|
# "query": str,
|
|
# "subgraph": {"nodes": {...}, "edges": [...]},
|
|
# "supporting_documents": [
|
|
# {
|
|
# "doc_id": str,
|
|
# "doc_title": str,
|
|
# "supporting_chunks": [{"chunk_id": str, "text": str}, ...]
|
|
# }
|
|
# ],
|
|
# "evidence_chain": "memory leak --[causes]--> system crash"
|
|
# }
|
|
```
|
|
|
|
---
|
|
|
|
## Step 7: Return Structured Context
|
|
|
|
Return the result with:
|
|
- **Subgraph**: nodes + edges (the graph answer)
|
|
- **Supporting documents**: source chunks that prove each relation
|
|
- **Evidence chain**: human-readable path summary
|
|
- **Wiki references**: links to relevant wiki pages found in Step 2
|
|
|
|
**If valuable, file the answer back into the wiki:**
|
|
|
|
```python
|
|
wiki_store.write_page(
|
|
category="topic",
|
|
title=query,
|
|
content=f"# {query}\n\n**Evidence chain:** {result['evidence_chain']}\n\n...",
|
|
summary="...",
|
|
)
|
|
```
|
|
|
|
This way, future queries on the same topic find the answer instantly in the wiki.
|
|
|
|
---
|
|
|
|
## Rules
|
|
|
|
- NEVER fabricate nodes or edges not present in the graph
|
|
- NEVER traverse deeper than depth 2
|
|
- ALWAYS check the wiki before the graph (wiki-first)
|
|
- Always include seed nodes in the result, even if they have no edges
|
|
- Prefer edges with higher confidence when pruning
|
|
- File valuable answers back into the wiki as topic pages
|
|
- Return an empty subgraph (not an error) if no relevant nodes are found
|