Files
awesome-copilot/skills/phoenix-cli/SKILL.md
Jim Bennett d79183139a Add Arize and Phoenix LLM observability skills (#1204)
* Add 9 Arize LLM observability skills

Add skills for Arize AI platform covering trace export, instrumentation,
datasets, experiments, evaluators, AI provider integrations, annotations,
prompt optimization, and deep linking to the Arize UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add 3 Phoenix AI observability skills

Add skills for Phoenix (Arize open-source) covering CLI debugging,
LLM evaluation workflows, and OpenInference tracing/instrumentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Ignoring intentional bad spelling

* Fix CI: remove .DS_Store from generated skills README and add codespell ignore

Remove .DS_Store artifact from winmd-api-search asset listing in generated
README.skills.md so it matches the CI Linux build output. Add queston to
codespell ignore list (intentional misspelling example in arize-dataset skill).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add arize-ax and phoenix plugins

Bundle the 9 Arize skills into an arize-ax plugin and the 3 Phoenix
skills into a phoenix plugin for easier installation as single packages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix skill folder structures to match source repos

Move arize supporting files from references/ to root level and rename
phoenix references/ to rules/ to exactly match the original source
repository folder structures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fixing file locations

* Fixing readme

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:58:55 +11:00

163 lines
6.1 KiB
Markdown

---
name: phoenix-cli
description: Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, inspect datasets, and query the GraphQL API. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
license: Apache-2.0
compatibility: Requires Node.js (for npx) or global install of @arizeai/phoenix-cli. Optionally requires jq for JSON processing.
metadata:
author: arize-ai
version: "2.0.0"
---
# Phoenix CLI
## Invocation
```bash
px <resource> <action> # if installed globally
npx @arizeai/phoenix-cli <resource> <action> # no install required
```
The CLI uses singular resource commands with subcommands like `list` and `get`:
```bash
px trace list
px trace get <trace-id>
px span list
px dataset list
px dataset get <name>
```
## Setup
```bash
export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key # if auth is enabled
```
Always use `--format raw --no-progress` when piping to `jq`.
## Traces
```bash
px trace list --limit 20 --format raw --no-progress | jq .
px trace list --last-n-minutes 60 --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
px trace list --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
px trace get <trace-id> --format raw | jq .
px trace get <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'
```
## Spans
```bash
px span list --limit 20 # recent spans (table view)
px span list --last-n-minutes 60 --limit 50 # spans from last hour
px span list --span-kind LLM --limit 10 # only LLM spans
px span list --status-code ERROR --limit 20 # only errored spans
px span list --name chat_completion --limit 10 # filter by span name
px span list --trace-id <id> --format raw --no-progress | jq . # all spans for a trace
px span list --include-annotations --limit 10 # include annotation scores
px span list output.json --limit 100 # save to JSON file
px span list --format raw --no-progress | jq '.[] | select(.status_code == "ERROR")'
```
### Span JSON shape
```
Span
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN")
status_code ("OK"|"ERROR"|"UNSET"), status_message
context.span_id, context.trace_id, parent_id
start_time, end_time
attributes (same as trace span attributes above)
annotations[] (with --include-annotations)
name, result { score, label, explanation }
```
### Trace JSON shape
```
Trace
traceId, status ("OK"|"ERROR"), duration (ms), startTime, endTime
rootSpan — top-level span (parent_id: null)
spans[]
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT")
status_code ("OK"|"ERROR"), parent_id, context.span_id
attributes
input.value, output.value — raw input/output
llm.model_name, llm.provider
llm.token_count.prompt/completion/total
llm.token_count.prompt_details.cache_read
llm.token_count.completion_details.reasoning
llm.input_messages.{N}.message.role/content
llm.output_messages.{N}.message.role/content
llm.invocation_parameters — JSON string (temperature, etc.)
exception.message — set if span errored
```
## Sessions
```bash
px session list --limit 10 --format raw --no-progress | jq .
px session list --order asc --format raw --no-progress | jq '.[].session_id'
px session get <session-id> --format raw | jq .
px session get <session-id> --include-annotations --format raw | jq '.annotations'
```
### Session JSON shape
```
SessionData
id, session_id, project_id
start_time, end_time
traces[]
id, trace_id, start_time, end_time
SessionAnnotation (with --include-annotations)
id, name, annotator_kind ("LLM"|"CODE"|"HUMAN"), session_id
result { label, score, explanation }
metadata, identifier, source, created_at, updated_at
```
## Datasets / Experiments / Prompts
```bash
px dataset list --format raw --no-progress | jq '.[].name'
px dataset get <name> --format raw | jq '.examples[] | {input, output: .expected_output}'
px experiment list --dataset <name> --format raw --no-progress | jq '.[] | {id, name, failed_run_count}'
px experiment get <id> --format raw --no-progress | jq '.[] | select(.error != null) | {input, error}'
px prompt list --format raw --no-progress | jq '.[].name'
px prompt get <name> --format text --no-progress # plain text, ideal for piping to AI
```
## GraphQL
For ad-hoc queries not covered by the commands above. Output is `{"data": {...}}`.
```bash
px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | jq '.data.projects.edges[].node'
px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | jq '.data.datasets.edges[].node'
px api graphql '{ evaluators { edges { node { name kind } } } }' | jq '.data.evaluators.edges[].node'
# Introspect any type
px api graphql '{ __type(name: "Project") { fields { name type { name } } } }' | jq '.data.__type.fields[]'
```
Key root fields: `projects`, `datasets`, `prompts`, `evaluators`, `projectCount`, `datasetCount`, `promptCount`, `evaluatorCount`, `viewer`.
## Docs
Download Phoenix documentation markdown for local use by coding agents.
```bash
px docs fetch # fetch default workflow docs to .px/docs
px docs fetch --workflow tracing # fetch only tracing docs
px docs fetch --workflow tracing --workflow evaluation
px docs fetch --dry-run # preview what would be downloaded
px docs fetch --refresh # clear .px/docs and re-download
px docs fetch --output-dir ./my-docs # custom output directory
```
Key options: `--workflow` (repeatable, values: `tracing`, `evaluation`, `datasets`, `prompts`, `integrations`, `sdk`, `self-hosting`, `all`), `--dry-run`, `--refresh`, `--output-dir` (default `.px/docs`), `--workers` (default 10).