mirror of https://github.com/github/awesome-copilot.git synced 2026-04-11 10:45:56 +00:00

Files

Jim Bennett d79183139a Add Arize and Phoenix LLM observability skills (#1204 )

* Add 9 Arize LLM observability skills

Add skills for Arize AI platform covering trace export, instrumentation,
datasets, experiments, evaluators, AI provider integrations, annotations,
prompt optimization, and deep linking to the Arize UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add 3 Phoenix AI observability skills

Add skills for Phoenix (Arize open-source) covering CLI debugging,
LLM evaluation workflows, and OpenInference tracing/instrumentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Ignoring intentional bad spelling

* Fix CI: remove .DS_Store from generated skills README and add codespell ignore

Remove .DS_Store artifact from winmd-api-search asset listing in generated
README.skills.md so it matches the CI Linux build output. Add queston to
codespell ignore list (intentional misspelling example in arize-dataset skill).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add arize-ax and phoenix plugins

Bundle the 9 Arize skills into an arize-ax plugin and the 3 Phoenix
skills into a phoenix plugin for easier installation as single packages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix skill folder structures to match source repos

Move arize supporting files from references/ to root level and rename
phoenix references/ to rules/ to exactly match the original source
repository folder structures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fixing file locations

* Fixing readme

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-02 09:58:55 +11:00

4.9 KiB

Raw Blame History

Manual Instrumentation (Python)

Add custom spans using decorators or context managers for fine-grained tracing control.

Setup

pip install arize-phoenix-otel

from phoenix.otel import register
tracer_provider = register(project_name="my-app")
tracer = tracer_provider.get_tracer(__name__)

Quick Reference

Span Kind	Decorator	Use Case
CHAIN	`@tracer.chain`	Orchestration, workflows, pipelines
RETRIEVER	`@tracer.retriever`	Vector search, document retrieval
TOOL	`@tracer.tool`	External API calls, function execution
AGENT	`@tracer.agent`	Multi-step reasoning, planning
LLM	`@tracer.llm`	LLM API calls (manual only)
EMBEDDING	`@tracer.embedding`	Embedding generation
RERANKER	`@tracer.reranker`	Document re-ranking
GUARDRAIL	`@tracer.guardrail`	Safety checks, content moderation
EVALUATOR	`@tracer.evaluator`	LLM evaluation, quality checks

Decorator Approach (Recommended)

Use for: Full function instrumentation, automatic I/O capture

@tracer.chain
def rag_pipeline(query: str) -> str:
    docs = retrieve_documents(query)
    ranked = rerank(docs, query)
    return generate_response(ranked, query)

@tracer.retriever
def retrieve_documents(query: str) -> list[dict]:
    results = vector_db.search(query, top_k=5)
    return [{"content": doc.text, "score": doc.score} for doc in results]

@tracer.tool
def get_weather(city: str) -> str:
    response = requests.get(f"https://api.weather.com/{city}")
    return response.json()["weather"]

Custom span names:

@tracer.chain(name="rag-pipeline-v2")
def my_workflow(query: str) -> str:
    return process(query)

Context Manager Approach

Use for: Partial function instrumentation, custom attributes, dynamic control

from opentelemetry.trace import Status, StatusCode
import json

def retrieve_with_metadata(query: str):
    with tracer.start_as_current_span(
        "vector_search",
        openinference_span_kind="retriever"
    ) as span:
        span.set_attribute("input.value", query)

        results = vector_db.search(query, top_k=5)

        documents = [
            {
                "document.id": doc.id,
                "document.content": doc.text,
                "document.score": doc.score
            }
            for doc in results
        ]
        span.set_attribute("retrieval.documents", json.dumps(documents))
        span.set_status(Status(StatusCode.OK))

        return documents

Capturing Input/Output

Always capture I/O for evaluation-ready spans.

Automatic I/O Capture (Decorators)

Decorators automatically capture input arguments and return values:

@tracer.chain
def handle_query(user_input: str) -> str:
    result = agent.generate(user_input)
    return result.text

# Automatically captures:
# - input.value: user_input
# - output.value: result.text
# - input.mime_type / output.mime_type: auto-detected

Manual I/O Capture (Context Manager)

Use set_input() and set_output() for simple I/O capture:

from opentelemetry.trace import Status, StatusCode

def handle_query(user_input: str) -> str:
    with tracer.start_as_current_span(
        "query.handler",
        openinference_span_kind="chain"
    ) as span:
        span.set_input(user_input)

        result = agent.generate(user_input)

        span.set_output(result.text)
        span.set_status(Status(StatusCode.OK))

        return result.text

What gets captured:

{
  "input.value": "What is 2+2?",
  "input.mime_type": "text/plain",
  "output.value": "2+2 equals 4.",
  "output.mime_type": "text/plain"
}

Why this matters:

Phoenix evaluators require input.value and output.value
Phoenix UI displays I/O prominently for debugging
Enables exporting data for fine-tuning datasets

Custom I/O with Additional Metadata

Use set_attribute() for custom attributes alongside I/O:

def process_query(query: str):
    with tracer.start_as_current_span(
        "query.process",
        openinference_span_kind="chain"
    ) as span:
        # Standard I/O
        span.set_input(query)

        # Custom metadata
        span.set_attribute("input.length", len(query))

        result = llm.generate(query)

        # Standard output
        span.set_output(result.text)

        # Custom metadata
        span.set_attribute("output.tokens", result.usage.total_tokens)
        span.set_status(Status(StatusCode.OK))

        return result

4.9 KiB Raw Blame History