Files
awesome-copilot/skills/eval-driven-dev/references/wrap-api.md
Yiou Li 5f59ddb9cf update eval-driven-dev skill (#1352)
* update eval-driven-dev skill

* small refinement of skill description

* address review, rerun npm start.
2026-04-10 11:19:28 +10:00

8.9 KiB

Wrap API Reference

Auto-generated from pixie source code docstrings. Do not edit by hand — regenerate from the upstream pixie-qa source repository.

pixie.wrap — data-oriented observation API.

wrap() observes a data value or callable at a named point in the processing pipeline. Its behavior depends on the active mode:

  • No-op (tracing disabled, no eval registry): returns data unchanged.
  • Tracing (during pixie trace): writes to the trace file and emits an OTel event (via span event if a span is active, or via OTel logger otherwise) and returns data unchanged (or wraps a callable so the event fires on call).
  • Eval (eval registry active): injects dependency data for purpose="input", captures output/state for purpose="output"/ purpose="state".

CLI Commands

Command Description
pixie trace --runnable <filepath:ClassName> --input <kwargs.json> --output <file.jsonl> Run the Runnable once with kwargs from the JSON file and write a trace file. --input is a file path (not inline JSON).
pixie format <file.jsonl> Convert a trace file to a formatted dataset entry template. Shows entry_kwargs, eval_input, and eval_output (the real captured output).
pixie trace filter <file.jsonl> --purpose input Print only wrap events matching the given purposes. Outputs one JSON line per matching event.

Classes

pixie.Runnable

class pixie.Runnable(Protocol[T]):
    @classmethod
    def create(cls) -> Runnable[Any]: ...
    async def setup(self) -> None: ...
    async def run(self, args: T) -> None: ...
    async def teardown(self) -> None: ...

Protocol for structured runnables used by the dataset runner. T is a pydantic.BaseModel subclass whose fields match the entry_kwargs keys in the dataset JSON.

Lifecycle:

  1. create() — class method to construct and return a runnable instance.
  2. setup()async, called once before the first run() call. Initialize shared resources here (e.g., TestClient, database connections). Optional — has a default no-op implementation.
  3. run(args)async, called concurrently for each dataset entry (up to 4 entries in parallel). args is a validated Pydantic model built from entry_kwargs. Invoke the application's real entry point.
  4. teardown()async, called once after the last run() call. Release any resources acquired in setup(). Optional — has a default no-op implementation.

setup() and teardown() have default no-op implementations; you only need to override them when shared resources are required.

Concurrency: run() is called concurrently via asyncio.gather. Your implementation must be concurrency-safe. If it uses shared mutable state (e.g., a SQLite connection, an in-memory cache, a file handle), protect it with asyncio.Semaphore or asyncio.Lock:

class AppRunnable(pixie.Runnable[AppArgs]):
    _sem: asyncio.Semaphore

    @classmethod
    def create(cls) -> AppRunnable:
        inst = cls()
        inst._sem = asyncio.Semaphore(1)  # serialise DB access
        return inst

    async def run(self, args: AppArgs) -> None:
        async with self._sem:
            await call_app(args.message)

Common concurrency pitfalls:

  • SQLite: not safe for concurrent writes — use Semaphore(1) or aiosqlite with WAL mode.
  • Global mutable state: module-level dicts/lists modified in run() need protection.
  • Rate-limited APIs: add a semaphore to avoid 429 errors.

Import resolution: The project root directory (where pixie test / pixie trace is invoked) is automatically added to sys.path before loading runnables and evaluators. This means your runnable can use normal import statements to reference project modules (e.g., from app import service).

Example:

# pixie_qa/scripts/run_app.py
from __future__ import annotations
from pydantic import BaseModel
import pixie

class AppArgs(BaseModel):
    user_message: str

class AppRunnable(pixie.Runnable[AppArgs]):
    @classmethod
    def create(cls) -> AppRunnable:
        return cls()

    async def run(self, args: AppArgs) -> None:
        from myapp import handle_request
        await handle_request(args.user_message)

Web server example (using an async HTTP client):

import httpx
from pydantic import BaseModel
import pixie

class AppArgs(BaseModel):
    user_message: str

class AppRunnable(pixie.Runnable[AppArgs]):
    _client: httpx.AsyncClient

    @classmethod
    def create(cls) -> AppRunnable:
        return cls()

    async def setup(self) -> None:
        self._client = httpx.AsyncClient(base_url="http://localhost:8000")

    async def run(self, args: AppArgs) -> None:
        await self._client.post("/chat", json={"message": args.user_message})

    async def teardown(self) -> None:
        await self._client.aclose()

Functions

pixie.wrap

pixie.wrap(data: 'T', *, purpose: "Literal['input', 'output', 'state']", name: 'str', description: 'str | None' = None) -> 'T'

Observe a data value or data-provider callable at a point in the processing pipeline.

data can be either a plain value or a callable that produces a value. In both cases the return type is T — the caller gets back exactly the same type it passed in when in no-op or tracing modes.

In eval mode with purpose="input", the returned value (or callable) is replaced with the deserialized registry value. When data is callable the returned wrapper ignores the original function and returns the injected value on every call; in all other modes the returned callable wraps the original and adds tracing or capture behaviour.

Args: data: A data value or a data-provider callable. purpose: Classification of the data point: - "input": data from external dependencies (DB records, API responses) - "output": data going out to external systems or users - "state": intermediate state for evaluation (routing decisions, etc.) name: Unique identifier for this data point. Used as the key in the eval registry and in trace logs. description: Optional human-readable description of what this data is.

Returns: The original data unchanged (tracing / no-op modes), or the registry value (eval mode with purpose="input"). When data is callable the return value is also callable.


Error Types

WrapRegistryMissError

WrapRegistryMissError(name: 'str') -> 'None'

Raised when a wrap(purpose="input") name is not found in the eval registry.

WrapTypeMismatchError

WrapTypeMismatchError(name: 'str', expected_type: 'type', actual_type: 'type') -> 'None'

Raised when deserialized registry value doesn't match expected type.


Trace File Utilities

Pydantic model for wrap log entries and JSONL loading utilities.

WrapLogEntry is the typed representation of a single wrap() event as recorded in a JSONL trace file. Multiple places in the codebase load these objects — the pixie trace filter CLI, the dataset loader, and the verification scripts — so they share this single model.

pixie.WrapLogEntry

pixie.WrapLogEntry(*, type: str = 'wrap', name: str, purpose: str, data: Any, description: str | None = None, trace_id: str | None = None, span_id: str | None = None) -> None

A single wrap() event as logged to a JSONL trace file.

Attributes: type: Always "wrap" for wrap events. name: The wrap point name (matches wrap(name=...)). purpose: One of "input", "output", "state". data: The serialized data (jsonpickle string). description: Optional human-readable description. trace_id: OTel trace ID (if available). span_id: OTel span ID (if available).

pixie.load_wrap_log_entries

pixie.load_wrap_log_entries(jsonl_path: 'str | Path') -> 'list[WrapLogEntry]'

Load all wrap log entries from a JSONL file.

Skips non-wrap lines (e.g. type=llm_span) and malformed lines.

Args: jsonl_path: Path to a JSONL trace file.

Returns: List of :class:WrapLogEntry objects.

pixie.filter_by_purpose

pixie.filter_by_purpose(entries: 'list[WrapLogEntry]', purposes: 'set[str]') -> 'list[WrapLogEntry]'

Filter wrap log entries by purpose.

Args: entries: List of wrap log entries. purposes: Set of purpose values to include.

Returns: Filtered list.