awesome-copilot/plugins/phoenix/skills/phoenix-tracing/references/annotations-overview.md

# Annotations Overview

Annotations allow you to add human or automated feedback to traces, spans, documents, and sessions. Annotations are essential for evaluation, quality assessment, and building training datasets.

## Annotation Types

Phoenix supports four types of annotations:

| Type                    | Target                           | Purpose                                  | Example Use Case                 |
| ----------------------- | -------------------------------- | ---------------------------------------- | -------------------------------- |
| **Span Annotation**     | Individual span                  | Feedback on a specific operation         | "This LLM response was accurate" |
| **Document Annotation** | Document within a RETRIEVER span | Feedback on retrieved document relevance | "This document was not helpful"  |
| **Trace Annotation**    | Entire trace                     | Feedback on end-to-end interaction       | "User was satisfied with result" |
| **Session Annotation**  | User session                     | Feedback on multi-turn conversation      | "Session ended successfully"     |

## Annotation Fields

Every annotation has these fields:

### Required Fields

| Field     | Type   | Description                                                                   |
| --------- | ------ | ----------------------------------------------------------------------------- |
| Entity ID | String | ID of the target entity (span_id, trace_id, session_id, or document_position) |
| `name`    | String | Annotation name/label (e.g., "quality", "relevance", "helpfulness")           |

### Result Fields (At Least One Required)

| Field         | Type              | Description                                                       |
| ------------- | ----------------- | ----------------------------------------------------------------- |
| `label`       | String (optional) | Categorical value (e.g., "good", "bad", "relevant", "irrelevant") |
| `score`       | Float (optional)  | Numeric value (typically 0-1, but can be any range)               |
| `explanation` | String (optional) | Free-text explanation of the annotation                           |

**At least one** of `label`, `score`, or `explanation` must be provided.

### Optional Fields

| Field            | Type   | Description                                                                             |
| ---------------- | ------ | --------------------------------------------------------------------------------------- |
| `annotator_kind` | String | Who created this annotation: "HUMAN", "LLM", or "CODE" (default: "HUMAN")               |
| `identifier`     | String | Unique identifier for upsert behavior (updates existing if same name+entity+identifier) |
| `metadata`       | Object | Custom metadata as key-value pairs                                                      |

## Annotator Kinds

| Kind    | Description                    | Example                           |
| ------- | ------------------------------ | --------------------------------- |
| `HUMAN` | Manual feedback from a person  | User ratings, expert labels       |
| `LLM`   | Automated feedback from an LLM | GPT-4 evaluating response quality |
| `CODE`  | Automated feedback from code   | Rule-based checks, heuristics     |

## Examples

**Quality Assessment:**

- `quality` - Overall quality (label: good/fair/poor, score: 0-1)
- `correctness` - Factual accuracy (label: correct/incorrect, score: 0-1)
- `helpfulness` - User satisfaction (label: helpful/not_helpful, score: 0-1)

**RAG-Specific:**

- `relevance` - Document relevance to query (label: relevant/irrelevant, score: 0-1)
- `faithfulness` - Answer grounded in context (label: faithful/unfaithful, score: 0-1)

**Safety:**

- `toxicity` - Contains harmful content (score: 0-1)
- `pii_detected` - Contains personally identifiable information (label: yes/no)