mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-12 19:25:55 +00:00
* Add 9 Arize LLM observability skills Add skills for Arize AI platform covering trace export, instrumentation, datasets, experiments, evaluators, AI provider integrations, annotations, prompt optimization, and deep linking to the Arize UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add 3 Phoenix AI observability skills Add skills for Phoenix (Arize open-source) covering CLI debugging, LLM evaluation workflows, and OpenInference tracing/instrumentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Ignoring intentional bad spelling * Fix CI: remove .DS_Store from generated skills README and add codespell ignore Remove .DS_Store artifact from winmd-api-search asset listing in generated README.skills.md so it matches the CI Linux build output. Add queston to codespell ignore list (intentional misspelling example in arize-dataset skill). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add arize-ax and phoenix plugins Bundle the 9 Arize skills into an arize-ax plugin and the 3 Phoenix skills into a phoenix plugin for easier installation as single packages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix skill folder structures to match source repos Move arize supporting files from references/ to root level and rename phoenix references/ to rules/ to exactly match the original source repository folder structures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fixing file locations * Fixing readme --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
163 lines
6.1 KiB
Markdown
163 lines
6.1 KiB
Markdown
---
|
|
name: phoenix-cli
|
|
description: Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, inspect datasets, and query the GraphQL API. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
|
|
license: Apache-2.0
|
|
compatibility: Requires Node.js (for npx) or global install of @arizeai/phoenix-cli. Optionally requires jq for JSON processing.
|
|
metadata:
|
|
author: arize-ai
|
|
version: "2.0.0"
|
|
---
|
|
|
|
# Phoenix CLI
|
|
|
|
## Invocation
|
|
|
|
```bash
|
|
px <resource> <action> # if installed globally
|
|
npx @arizeai/phoenix-cli <resource> <action> # no install required
|
|
```
|
|
|
|
The CLI uses singular resource commands with subcommands like `list` and `get`:
|
|
|
|
```bash
|
|
px trace list
|
|
px trace get <trace-id>
|
|
px span list
|
|
px dataset list
|
|
px dataset get <name>
|
|
```
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
export PHOENIX_HOST=http://localhost:6006
|
|
export PHOENIX_PROJECT=my-project
|
|
export PHOENIX_API_KEY=your-api-key # if auth is enabled
|
|
```
|
|
|
|
Always use `--format raw --no-progress` when piping to `jq`.
|
|
|
|
## Traces
|
|
|
|
```bash
|
|
px trace list --limit 20 --format raw --no-progress | jq .
|
|
px trace list --last-n-minutes 60 --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
|
|
px trace list --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
|
|
px trace get <trace-id> --format raw | jq .
|
|
px trace get <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'
|
|
```
|
|
|
|
## Spans
|
|
|
|
```bash
|
|
px span list --limit 20 # recent spans (table view)
|
|
px span list --last-n-minutes 60 --limit 50 # spans from last hour
|
|
px span list --span-kind LLM --limit 10 # only LLM spans
|
|
px span list --status-code ERROR --limit 20 # only errored spans
|
|
px span list --name chat_completion --limit 10 # filter by span name
|
|
px span list --trace-id <id> --format raw --no-progress | jq . # all spans for a trace
|
|
px span list --include-annotations --limit 10 # include annotation scores
|
|
px span list output.json --limit 100 # save to JSON file
|
|
px span list --format raw --no-progress | jq '.[] | select(.status_code == "ERROR")'
|
|
```
|
|
|
|
### Span JSON shape
|
|
|
|
```
|
|
Span
|
|
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT"|"RERANKER"|"GUARDRAIL"|"EVALUATOR"|"UNKNOWN")
|
|
status_code ("OK"|"ERROR"|"UNSET"), status_message
|
|
context.span_id, context.trace_id, parent_id
|
|
start_time, end_time
|
|
attributes (same as trace span attributes above)
|
|
annotations[] (with --include-annotations)
|
|
name, result { score, label, explanation }
|
|
```
|
|
|
|
### Trace JSON shape
|
|
|
|
```
|
|
Trace
|
|
traceId, status ("OK"|"ERROR"), duration (ms), startTime, endTime
|
|
rootSpan — top-level span (parent_id: null)
|
|
spans[]
|
|
name, span_kind ("LLM"|"CHAIN"|"TOOL"|"RETRIEVER"|"EMBEDDING"|"AGENT")
|
|
status_code ("OK"|"ERROR"), parent_id, context.span_id
|
|
attributes
|
|
input.value, output.value — raw input/output
|
|
llm.model_name, llm.provider
|
|
llm.token_count.prompt/completion/total
|
|
llm.token_count.prompt_details.cache_read
|
|
llm.token_count.completion_details.reasoning
|
|
llm.input_messages.{N}.message.role/content
|
|
llm.output_messages.{N}.message.role/content
|
|
llm.invocation_parameters — JSON string (temperature, etc.)
|
|
exception.message — set if span errored
|
|
```
|
|
|
|
## Sessions
|
|
|
|
```bash
|
|
px session list --limit 10 --format raw --no-progress | jq .
|
|
px session list --order asc --format raw --no-progress | jq '.[].session_id'
|
|
px session get <session-id> --format raw | jq .
|
|
px session get <session-id> --include-annotations --format raw | jq '.annotations'
|
|
```
|
|
|
|
### Session JSON shape
|
|
|
|
```
|
|
SessionData
|
|
id, session_id, project_id
|
|
start_time, end_time
|
|
traces[]
|
|
id, trace_id, start_time, end_time
|
|
|
|
SessionAnnotation (with --include-annotations)
|
|
id, name, annotator_kind ("LLM"|"CODE"|"HUMAN"), session_id
|
|
result { label, score, explanation }
|
|
metadata, identifier, source, created_at, updated_at
|
|
```
|
|
|
|
## Datasets / Experiments / Prompts
|
|
|
|
```bash
|
|
px dataset list --format raw --no-progress | jq '.[].name'
|
|
px dataset get <name> --format raw | jq '.examples[] | {input, output: .expected_output}'
|
|
px experiment list --dataset <name> --format raw --no-progress | jq '.[] | {id, name, failed_run_count}'
|
|
px experiment get <id> --format raw --no-progress | jq '.[] | select(.error != null) | {input, error}'
|
|
px prompt list --format raw --no-progress | jq '.[].name'
|
|
px prompt get <name> --format text --no-progress # plain text, ideal for piping to AI
|
|
```
|
|
|
|
## GraphQL
|
|
|
|
For ad-hoc queries not covered by the commands above. Output is `{"data": {...}}`.
|
|
|
|
```bash
|
|
px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
|
|
px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | jq '.data.projects.edges[].node'
|
|
px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | jq '.data.datasets.edges[].node'
|
|
px api graphql '{ evaluators { edges { node { name kind } } } }' | jq '.data.evaluators.edges[].node'
|
|
|
|
# Introspect any type
|
|
px api graphql '{ __type(name: "Project") { fields { name type { name } } } }' | jq '.data.__type.fields[]'
|
|
```
|
|
|
|
Key root fields: `projects`, `datasets`, `prompts`, `evaluators`, `projectCount`, `datasetCount`, `promptCount`, `evaluatorCount`, `viewer`.
|
|
|
|
## Docs
|
|
|
|
Download Phoenix documentation markdown for local use by coding agents.
|
|
|
|
```bash
|
|
px docs fetch # fetch default workflow docs to .px/docs
|
|
px docs fetch --workflow tracing # fetch only tracing docs
|
|
px docs fetch --workflow tracing --workflow evaluation
|
|
px docs fetch --dry-run # preview what would be downloaded
|
|
px docs fetch --refresh # clear .px/docs and re-download
|
|
px docs fetch --output-dir ./my-docs # custom output directory
|
|
```
|
|
|
|
Key options: `--workflow` (repeatable, values: `tracing`, `evaluation`, `datasets`, `prompts`, `integrations`, `sdk`, `self-hosting`, `all`), `--dry-run`, `--refresh`, `--output-dir` (default `.px/docs`), `--workers` (default 10).
|