# Step 2: Instrument with `wrap` and capture a reference trace > For the full `wrap()` API, the `Runnable` class, and CLI commands, see `wrap-api.md`. **Why this step**: You need to see the actual data flowing through the app before you can build anything. This step adds `wrap()` calls to mark data boundaries, implements a `Runnable` class, captures a reference trace with `pixie trace`, and verifies all eval criteria can be evaluated. This step consolidates three things: (1) data-flow analysis, (2) instrumentation, and (3) writing the runnable. --- ## 2a. Data-flow analysis and `wrap` instrumentation Starting from LLM call sites, trace backwards and forwards through the code to find: - **Entry input**: what the user sends in (via the entry point) - **Dependency input**: data from external systems (databases, APIs, caches) - **App output**: data going out to users or external systems - **Intermediate state**: internal decisions relevant to evaluation (routing, tool calls) For each data point found, **immediately add a `wrap()` call** in the application code: ```python import pixie # External dependency data — value form (result of a DB/API call) profile = pixie.wrap(db.get_profile(user_id), purpose="input", name="customer_profile", description="Customer profile fetched from database") # External dependency data — function form (for lazy evaluation / avoiding the call) history = pixie.wrap(redis.get_history, purpose="input", name="conversation_history", description="Conversation history from Redis")(session_id) # App output — what the user receives response = pixie.wrap(response_text, purpose="output", name="response", description="The assistant's response to the user") # Intermediate state — internal decision relevant to evaluation selected_agent = pixie.wrap(selected_agent, purpose="state", name="routing_decision", description="Which agent was selected to handle this request") ``` ### Rules for wrapping 1. **Wrap at the data boundary** — where data enters or exits the application, not deep inside utility functions 2. **Names must be unique** across the entire application (they are used as registry keys and dataset field names) 3. **Use `lower_snake_case`** for names 4. **Don't wrap LLM call arguments or responses** — those are already captured by OpenInference auto-instrumentation 5. **Don't change the function's interface** — `wrap()` is purely additive, returns the same type ### Value vs. function wrapping ```python # Value form: wrap a data value (result already computed) profile = pixie.wrap(db.get_profile(user_id), purpose="input", name="customer_profile") # Function form: wrap the callable itself — in eval mode the original function # is NOT called; the registry value is returned instead. profile = pixie.wrap(db.get_profile, purpose="input", name="customer_profile")(user_id) ``` Use function form when you want to prevent the external call from happening in eval mode (e.g., the call is expensive, has side-effects, or you simply want a clean injection point). In tracing mode, the function is called normally and the result is logged. ### Coverage check After adding `wrap()` calls, go through each eval criterion from `pixie_qa/02-eval-criteria.md` and verify that every required data point has a corresponding wrap call. If a criterion needs data that isn't captured, add the wrap now — don't defer. ## 2b. Implement the Runnable class The `Runnable` class replaces the plain function from older versions of the skill. It exposes three lifecycle methods: - **`setup()`** — async, called once before any `run()` call; initialize shared resources here (e.g., an async HTTP client, a DB connection, pre-loaded configuration). Optional — has a default no-op. - **`run(args)`** — async, called **concurrently** for each dataset entry (up to 4 in parallel); invoke the app's real entry point with `args` (a validated Pydantic model built from `entry_kwargs`). **Must be concurrency-safe** — see below. - **`teardown()`** — async, called once after all `run()` calls; clean up resources. Optional — has a default no-op. **Import resolution**: The project root is automatically added to `sys.path` when your runnable is loaded, so you can use normal `import` statements (e.g., `from app import service`) — no `sys.path` manipulation needed. Place the class in `pixie_qa/scripts/run_app.py`: ```python # pixie_qa/scripts/run_app.py from __future__ import annotations from pydantic import BaseModel import pixie class AppArgs(BaseModel): user_message: str class AppRunnable(pixie.Runnable[AppArgs]): """Runnable that drives the application for tracing and evaluation. wrap(purpose="input") calls in the app inject dependency data from the test registry automatically. wrap(purpose="output"/"state") calls capture data for evaluation. No manual mocking needed. """ @classmethod def create(cls) -> AppRunnable: return cls() async def run(self, args: AppArgs) -> None: from myapp import handle_request await handle_request(args.user_message) ``` **For web servers**, initialize an async HTTP client in `setup()` and use it in `run()`: ```python import httpx from pydantic import BaseModel import pixie class AppArgs(BaseModel): user_message: str class AppRunnable(pixie.Runnable[AppArgs]): _client: httpx.AsyncClient @classmethod def create(cls) -> AppRunnable: return cls() async def setup(self) -> None: self._client = httpx.AsyncClient(base_url="http://localhost:8000") async def run(self, args: AppArgs) -> None: await self._client.post("/chat", json={"message": args.user_message}) async def teardown(self) -> None: await self._client.aclose() ``` **For FastAPI/Starlette apps** (in-process testing without starting a server), use `httpx.ASGITransport` to run the ASGI app directly. This is faster and avoids port management: ```python import asyncio import httpx from pydantic import BaseModel import pixie class AppArgs(BaseModel): user_message: str class AppRunnable(pixie.Runnable[AppArgs]): _client: httpx.AsyncClient _sem: asyncio.Semaphore @classmethod def create(cls) -> AppRunnable: inst = cls() inst._sem = asyncio.Semaphore(1) # serialise if app uses shared mutable state return inst async def setup(self) -> None: from myapp.main import app # your FastAPI/Starlette app instance # ASGITransport runs the app in-process — no server needed transport = httpx.ASGITransport(app=app) self._client = httpx.AsyncClient(transport=transport, base_url="http://test") async def run(self, args: AppArgs) -> None: async with self._sem: await self._client.post("/chat", json={"message": args.user_message}) async def teardown(self) -> None: await self._client.aclose() ``` Choose the right pattern: - **Direct function call**: when the app exposes a simple async function (no web framework) - **`httpx.AsyncClient` with `base_url`**: when you need to test against a running HTTP server - **`httpx.ASGITransport`**: when the app is FastAPI/Starlette — fastest, no server needed, most reliable for eval **Rules**: - The `run()` method receives a Pydantic model whose fields are populated from the dataset's `entry_kwargs`. Define a `BaseModel` subclass with the fields your app needs. - All lifecycle methods (`setup`, `run`, `teardown`) are **async**. - `run()` must call the app through its real entry point — never bypass request handling. - Place the file at `pixie_qa/scripts/run_app.py` — name the class `AppRunnable` (or anything descriptive). - The dataset's `"runnable"` field references the class: `"pixie_qa/scripts/run_app.py:AppRunnable"`. **Concurrency**: `run()` is called concurrently for multiple dataset entries (up to 4 in parallel). If the app uses shared mutable state — SQLite, file-based DBs, global caches — you must synchronise access: ```python import asyncio class AppRunnable(pixie.Runnable[AppArgs]): _sem: asyncio.Semaphore @classmethod def create(cls) -> AppRunnable: inst = cls() inst._sem = asyncio.Semaphore(1) # serialise DB access return inst async def run(self, args: AppArgs) -> None: async with self._sem: await call_app(args.message) ``` Common concurrency pitfalls: - **SQLite**: `sqlite3` connections are not safe for concurrent async writes. Use `Semaphore(1)` to serialise, or switch to `aiosqlite` with WAL mode. - **Global mutable state**: module-level dicts/lists modified in `run()` need a lock. - **Rate-limited external APIs**: add a semaphore to avoid 429 errors. ## 2c. Capture the reference trace with `pixie trace` Use the `pixie trace` CLI command to run your `Runnable` and capture a trace file. Pass the entry input as a JSON file: ```bash # Create a JSON file with entry kwargs echo '{"user_message": "a realistic sample input"}' > pixie_qa/sample-input.json pixie trace --runnable pixie_qa/scripts/run_app.py:AppRunnable \ --input pixie_qa/sample-input.json \ --output pixie_qa/reference-trace.jsonl ``` The `--input` flag takes a **file path** to a JSON file (not inline JSON). The JSON object keys become the kwargs passed to the Pydantic model. The command calls `AppRunnable.create()`, then `setup()`, then `run(args)` once with the given input, then `teardown()`. The resulting trace is written to the output file. The JSONL trace file will contain one line per `wrap()` event and one line per LLM span: ```jsonl {"type": "kwargs", "value": {"user_message": "What are your hours?"}} {"type": "wrap", "name": "customer_profile", "purpose": "input", "data": {...}, ...} {"type": "llm_span", "request_model": "gpt-4o", "input_messages": [...], ...} {"type": "wrap", "name": "response", "purpose": "output", "data": "Our hours are...", ...} ``` ## 2d. Verify wrap coverage with `pixie format` Run `pixie format` on the trace file to see the data in dataset-entry format. This shows you both the data shapes and what a real app output looks like: ```bash pixie format --input reference-trace.jsonl --output dataset-sample.json ``` The output is a formatted dataset entry template — it contains: - `entry_kwargs`: the exact keys/values for the runnable arguments - `eval_input`: the data for all dependencies (from `wrap(purpose="input")` calls) - `eval_output`: the **actual app output** captured from the trace (this is the real output — use it to understand what the app produces, not as a dataset `eval_output` field) For each eval criterion from `pixie_qa/02-eval-criteria.md`, verify the format output contains the data needed to evaluate it. If a data point is missing, go back and add the `wrap()` call. --- ## Output - `pixie_qa/scripts/run_app.py` — the `Runnable` class - `pixie_qa/reference-trace.jsonl` — the reference trace with all expected wrap events