mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-12 03:05:55 +00:00
update eval-driven-dev skill (#1352)
* update eval-driven-dev skill * small refinement of skill description * address review, rerun npm start.
This commit is contained in:
255
skills/eval-driven-dev/references/wrap-api.md
Normal file
255
skills/eval-driven-dev/references/wrap-api.md
Normal file
@@ -0,0 +1,255 @@
|
||||
# Wrap API Reference
|
||||
|
||||
> Auto-generated from pixie source code docstrings.
|
||||
> Do not edit by hand — regenerate from the upstream [pixie-qa](https://github.com/yiouli/pixie-qa) source repository.
|
||||
|
||||
`pixie.wrap` — data-oriented observation API.
|
||||
|
||||
`wrap()` observes a data value or callable at a named point in the
|
||||
processing pipeline. Its behavior depends on the active mode:
|
||||
|
||||
- **No-op** (tracing disabled, no eval registry): returns `data` unchanged.
|
||||
- **Tracing** (during `pixie trace`): writes to the trace file and emits an
|
||||
OTel event (via span event if a span is active, or via OTel logger
|
||||
otherwise) and returns `data` unchanged (or wraps a callable so the
|
||||
event fires on call).
|
||||
- **Eval** (eval registry active): injects dependency data for
|
||||
`purpose="input"`, captures output/state for `purpose="output"`/
|
||||
`purpose="state"`.
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
| ----------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `pixie trace --runnable <filepath:ClassName> --input <kwargs.json> --output <file.jsonl>` | Run the Runnable once with kwargs from the JSON file and write a trace file. `--input` is a **file path** (not inline JSON). |
|
||||
| `pixie format <file.jsonl>` | Convert a trace file to a formatted dataset entry template. Shows `entry_kwargs`, `eval_input`, and `eval_output` (the real captured output). |
|
||||
| `pixie trace filter <file.jsonl> --purpose input` | Print only wrap events matching the given purposes. Outputs one JSON line per matching event. |
|
||||
|
||||
---
|
||||
|
||||
## Classes
|
||||
|
||||
### `pixie.Runnable`
|
||||
|
||||
```python
|
||||
class pixie.Runnable(Protocol[T]):
|
||||
@classmethod
|
||||
def create(cls) -> Runnable[Any]: ...
|
||||
async def setup(self) -> None: ...
|
||||
async def run(self, args: T) -> None: ...
|
||||
async def teardown(self) -> None: ...
|
||||
```
|
||||
|
||||
Protocol for structured runnables used by the dataset runner. `T` is a
|
||||
`pydantic.BaseModel` subclass whose fields match the `entry_kwargs` keys
|
||||
in the dataset JSON.
|
||||
|
||||
Lifecycle:
|
||||
|
||||
1. `create()` — class method to construct and return a runnable instance.
|
||||
2. `setup()` — **async**, called **once** before the first `run()` call.
|
||||
Initialize shared resources here (e.g., `TestClient`, database connections).
|
||||
Optional — has a default no-op implementation.
|
||||
3. `run(args)` — **async**, called **concurrently for each dataset entry**
|
||||
(up to 4 entries in parallel). `args` is a validated Pydantic model
|
||||
built from `entry_kwargs`. Invoke the application's real entry point.
|
||||
4. `teardown()` — **async**, called **once** after the last `run()` call.
|
||||
Release any resources acquired in `setup()`.
|
||||
Optional — has a default no-op implementation.
|
||||
|
||||
`setup()` and `teardown()` have default no-op implementations;
|
||||
you only need to override them when shared resources are required.
|
||||
|
||||
**Concurrency**: `run()` is called concurrently via `asyncio.gather`. Your
|
||||
implementation **must be concurrency-safe**. If it uses shared mutable state
|
||||
(e.g., a SQLite connection, an in-memory cache, a file handle), protect it
|
||||
with `asyncio.Semaphore` or `asyncio.Lock`:
|
||||
|
||||
```python
|
||||
class AppRunnable(pixie.Runnable[AppArgs]):
|
||||
_sem: asyncio.Semaphore
|
||||
|
||||
@classmethod
|
||||
def create(cls) -> AppRunnable:
|
||||
inst = cls()
|
||||
inst._sem = asyncio.Semaphore(1) # serialise DB access
|
||||
return inst
|
||||
|
||||
async def run(self, args: AppArgs) -> None:
|
||||
async with self._sem:
|
||||
await call_app(args.message)
|
||||
```
|
||||
|
||||
Common concurrency pitfalls:
|
||||
|
||||
- **SQLite**: not safe for concurrent writes — use `Semaphore(1)` or `aiosqlite` with WAL mode.
|
||||
- **Global mutable state**: module-level dicts/lists modified in `run()` need protection.
|
||||
- **Rate-limited APIs**: add a semaphore to avoid 429 errors.
|
||||
|
||||
**Import resolution**: The project root directory (where `pixie test` / `pixie trace`
|
||||
is invoked) is automatically added to `sys.path` before loading runnables and
|
||||
evaluators. This means your runnable can use normal `import` statements to
|
||||
reference project modules (e.g., `from app import service`).
|
||||
|
||||
**Example**:
|
||||
|
||||
```python
|
||||
# pixie_qa/scripts/run_app.py
|
||||
from __future__ import annotations
|
||||
from pydantic import BaseModel
|
||||
import pixie
|
||||
|
||||
class AppArgs(BaseModel):
|
||||
user_message: str
|
||||
|
||||
class AppRunnable(pixie.Runnable[AppArgs]):
|
||||
@classmethod
|
||||
def create(cls) -> AppRunnable:
|
||||
return cls()
|
||||
|
||||
async def run(self, args: AppArgs) -> None:
|
||||
from myapp import handle_request
|
||||
await handle_request(args.user_message)
|
||||
```
|
||||
|
||||
**Web server example** (using an async HTTP client):
|
||||
|
||||
```python
|
||||
import httpx
|
||||
from pydantic import BaseModel
|
||||
import pixie
|
||||
|
||||
class AppArgs(BaseModel):
|
||||
user_message: str
|
||||
|
||||
class AppRunnable(pixie.Runnable[AppArgs]):
|
||||
_client: httpx.AsyncClient
|
||||
|
||||
@classmethod
|
||||
def create(cls) -> AppRunnable:
|
||||
return cls()
|
||||
|
||||
async def setup(self) -> None:
|
||||
self._client = httpx.AsyncClient(base_url="http://localhost:8000")
|
||||
|
||||
async def run(self, args: AppArgs) -> None:
|
||||
await self._client.post("/chat", json={"message": args.user_message})
|
||||
|
||||
async def teardown(self) -> None:
|
||||
await self._client.aclose()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Functions
|
||||
|
||||
### `pixie.wrap`
|
||||
|
||||
```python
|
||||
pixie.wrap(data: 'T', *, purpose: "Literal['input', 'output', 'state']", name: 'str', description: 'str | None' = None) -> 'T'
|
||||
```
|
||||
|
||||
Observe a data value or data-provider callable at a point in the processing pipeline.
|
||||
|
||||
`data` can be either a plain value or a callable that produces a value.
|
||||
In both cases the return type is `T` — the caller gets back exactly the
|
||||
same type it passed in when in no-op or tracing modes.
|
||||
|
||||
In eval mode with `purpose="input"`, the returned value (or callable) is
|
||||
replaced with the deserialized registry value. When `data` is callable
|
||||
the returned wrapper ignores the original function and returns the injected
|
||||
value on every call; in all other modes the returned callable wraps the
|
||||
original and adds tracing or capture behaviour.
|
||||
|
||||
Args:
|
||||
data: A data value or a data-provider callable.
|
||||
purpose: Classification of the data point: - "input": data from external dependencies (DB records, API responses) - "output": data going out to external systems or users - "state": intermediate state for evaluation (routing decisions, etc.)
|
||||
name: Unique identifier for this data point. Used as the key in the
|
||||
eval registry and in trace logs.
|
||||
description: Optional human-readable description of what this data is.
|
||||
|
||||
Returns:
|
||||
The original data unchanged (tracing / no-op modes), or the
|
||||
registry value (eval mode with purpose="input"). When `data`
|
||||
is callable the return value is also callable.
|
||||
|
||||
---
|
||||
|
||||
## Error Types
|
||||
|
||||
### `WrapRegistryMissError`
|
||||
|
||||
```python
|
||||
WrapRegistryMissError(name: 'str') -> 'None'
|
||||
```
|
||||
|
||||
Raised when a wrap(purpose="input") name is not found in the eval registry.
|
||||
|
||||
### `WrapTypeMismatchError`
|
||||
|
||||
```python
|
||||
WrapTypeMismatchError(name: 'str', expected_type: 'type', actual_type: 'type') -> 'None'
|
||||
```
|
||||
|
||||
Raised when deserialized registry value doesn't match expected type.
|
||||
|
||||
---
|
||||
|
||||
## Trace File Utilities
|
||||
|
||||
Pydantic model for wrap log entries and JSONL loading utilities.
|
||||
|
||||
`WrapLogEntry` is the typed representation of a single `wrap()` event
|
||||
as recorded in a JSONL trace file. Multiple places in the codebase load
|
||||
these objects — the `pixie trace filter` CLI, the dataset loader, and
|
||||
the verification scripts — so they share this single model.
|
||||
|
||||
### `pixie.WrapLogEntry`
|
||||
|
||||
```python
|
||||
pixie.WrapLogEntry(*, type: str = 'wrap', name: str, purpose: str, data: Any, description: str | None = None, trace_id: str | None = None, span_id: str | None = None) -> None
|
||||
```
|
||||
|
||||
A single wrap() event as logged to a JSONL trace file.
|
||||
|
||||
Attributes:
|
||||
type: Always `"wrap"` for wrap events.
|
||||
name: The wrap point name (matches `wrap(name=...)`).
|
||||
purpose: One of `"input"`, `"output"`, `"state"`.
|
||||
data: The serialized data (jsonpickle string).
|
||||
description: Optional human-readable description.
|
||||
trace_id: OTel trace ID (if available).
|
||||
span_id: OTel span ID (if available).
|
||||
|
||||
### `pixie.load_wrap_log_entries`
|
||||
|
||||
```python
|
||||
pixie.load_wrap_log_entries(jsonl_path: 'str | Path') -> 'list[WrapLogEntry]'
|
||||
```
|
||||
|
||||
Load all wrap log entries from a JSONL file.
|
||||
|
||||
Skips non-wrap lines (e.g. `type=llm_span`) and malformed lines.
|
||||
|
||||
Args:
|
||||
jsonl_path: Path to a JSONL trace file.
|
||||
|
||||
Returns:
|
||||
List of :class:`WrapLogEntry` objects.
|
||||
|
||||
### `pixie.filter_by_purpose`
|
||||
|
||||
```python
|
||||
pixie.filter_by_purpose(entries: 'list[WrapLogEntry]', purposes: 'set[str]') -> 'list[WrapLogEntry]'
|
||||
```
|
||||
|
||||
Filter wrap log entries by purpose.
|
||||
|
||||
Args:
|
||||
entries: List of wrap log entries.
|
||||
purposes: Set of purpose values to include.
|
||||
|
||||
Returns:
|
||||
Filtered list.
|
||||
Reference in New Issue
Block a user