mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-12 11:15:56 +00:00
111 lines
3.4 KiB
Markdown
111 lines
3.4 KiB
Markdown
# RETRIEVER Spans
|
|
|
|
## Purpose
|
|
|
|
RETRIEVER spans represent document/context retrieval operations (vector DB queries, semantic search, keyword search).
|
|
|
|
## Required Attributes
|
|
|
|
| Attribute | Type | Description | Required |
|
|
|-----------|------|-------------|----------|
|
|
| `openinference.span.kind` | String | Must be "RETRIEVER" | Yes |
|
|
|
|
## Attribute Reference
|
|
|
|
### Query
|
|
|
|
| Attribute | Type | Description |
|
|
|-----------|------|-------------|
|
|
| `input.value` | String | Search query text |
|
|
|
|
### Document Schema
|
|
|
|
| Attribute Pattern | Type | Description |
|
|
|-------------------|------|-------------|
|
|
| `retrieval.documents.{i}.document.id` | String | Unique document identifier |
|
|
| `retrieval.documents.{i}.document.content` | String | Document text content |
|
|
| `retrieval.documents.{i}.document.score` | Float | Relevance score (0-1 or distance) |
|
|
| `retrieval.documents.{i}.document.metadata` | String (JSON) | Document metadata |
|
|
|
|
### Flattening Pattern for Documents
|
|
|
|
Documents are flattened using zero-indexed notation:
|
|
|
|
```
|
|
retrieval.documents.0.document.id
|
|
retrieval.documents.0.document.content
|
|
retrieval.documents.0.document.score
|
|
retrieval.documents.1.document.id
|
|
retrieval.documents.1.document.content
|
|
retrieval.documents.1.document.score
|
|
...
|
|
```
|
|
|
|
### Document Metadata
|
|
|
|
Common metadata fields (stored as JSON string):
|
|
|
|
```json
|
|
{
|
|
"source": "knowledge_base.pdf",
|
|
"page": 42,
|
|
"section": "Introduction",
|
|
"author": "Jane Doe",
|
|
"created_at": "2024-01-15",
|
|
"url": "https://example.com/doc",
|
|
"chunk_id": "chunk_123"
|
|
}
|
|
```
|
|
|
|
**Example with metadata:**
|
|
```json
|
|
{
|
|
"retrieval.documents.0.document.id": "doc_123",
|
|
"retrieval.documents.0.document.content": "Machine learning is a method of data analysis...",
|
|
"retrieval.documents.0.document.score": 0.92,
|
|
"retrieval.documents.0.document.metadata": "{\"source\": \"ml_textbook.pdf\", \"page\": 15, \"chapter\": \"Introduction\"}"
|
|
}
|
|
```
|
|
|
|
### Ordering
|
|
|
|
Documents are ordered by index (0, 1, 2, ...). Typically:
|
|
- Index 0 = highest scoring document
|
|
- Index 1 = second highest
|
|
- etc.
|
|
|
|
Preserve retrieval order in your flattened attributes.
|
|
|
|
### Large Document Handling
|
|
|
|
For very long documents:
|
|
- Consider truncating `document.content` to first N characters
|
|
- Store full content in separate document store
|
|
- Use `document.id` to reference full content
|
|
|
|
## Examples
|
|
|
|
### Basic Vector Search
|
|
|
|
```json
|
|
{
|
|
"openinference.span.kind": "RETRIEVER",
|
|
"input.value": "What is machine learning?",
|
|
"retrieval.documents.0.document.id": "doc_123",
|
|
"retrieval.documents.0.document.content": "Machine learning is a subset of artificial intelligence...",
|
|
"retrieval.documents.0.document.score": 0.92,
|
|
"retrieval.documents.0.document.metadata": "{\"source\": \"textbook.pdf\", \"page\": 42}",
|
|
"retrieval.documents.1.document.id": "doc_456",
|
|
"retrieval.documents.1.document.content": "Machine learning algorithms learn patterns from data...",
|
|
"retrieval.documents.1.document.score": 0.87,
|
|
"retrieval.documents.1.document.metadata": "{\"source\": \"article.html\", \"author\": \"Jane Doe\"}",
|
|
"retrieval.documents.2.document.id": "doc_789",
|
|
"retrieval.documents.2.document.content": "Supervised learning is a type of machine learning...",
|
|
"retrieval.documents.2.document.score": 0.81,
|
|
"retrieval.documents.2.document.metadata": "{\"source\": \"wiki.org\"}",
|
|
"metadata.retriever_type": "vector_search",
|
|
"metadata.vector_db": "pinecone",
|
|
"metadata.top_k": 3
|
|
}
|
|
```
|