chore: publish from staged

2026-04-13 03:35:55 +00:00 · 2026-04-09 06:26:21 +00:00
parent 017f31f495
commit a68b190031
467 changed files with 97527 additions and 276 deletions
--- a/plugins/arize-ax/skills/arize-prompt-optimization/SKILL.md
+++ b/plugins/arize-ax/skills/arize-prompt-optimization/SKILL.md
@@ -0,0 +1,450 @@
+---
+name: arize-prompt-optimization
+description: "INVOKE THIS SKILL when optimizing, improving, or debugging LLM prompts using production trace data, evaluations, and annotations. Covers extracting prompts from spans, gathering performance signal, and running a data-driven optimization loop using the ax CLI."
+---
+
+# Arize Prompt Optimization Skill
+
+## Concepts
+
+### Where Prompts Live in Trace Data
+
+LLM applications emit spans following OpenInference semantic conventions. Prompts are stored in different span attributes depending on the span kind and instrumentation:
+
+| Column | What it contains | When to use |
+|--------|-----------------|-------------|
+| `attributes.llm.input_messages` | Structured chat messages (system, user, assistant, tool) in role-based format | **Primary source** for chat-based LLM prompts |
+| `attributes.llm.input_messages.roles` | Array of roles: `system`, `user`, `assistant`, `tool` | Extract individual message roles |
+| `attributes.llm.input_messages.contents` | Array of message content strings | Extract message text |
+| `attributes.input.value` | Serialized prompt or user question (generic, all span kinds) | Fallback when structured messages are not available |
+| `attributes.llm.prompt_template.template` | Template with `{variable}` placeholders (e.g., `"Answer {question} using {context}"`) | When the app uses prompt templates |
+| `attributes.llm.prompt_template.variables` | Template variable values (JSON object) | See what values were substituted into the template |
+| `attributes.output.value` | Model response text | See what the LLM produced |
+| `attributes.llm.output_messages` | Structured model output (including tool calls) | Inspect tool-calling responses |
+
+### Finding Prompts by Span Kind
+
+- **LLM span** (`attributes.openinference.span.kind = 'LLM'`): Check `attributes.llm.input_messages` for structured chat messages, OR `attributes.input.value` for a serialized prompt. Check `attributes.llm.prompt_template.template` for the template.
+- **Chain/Agent span**: `attributes.input.value` contains the user's question. The actual LLM prompt lives on **child LLM spans** -- navigate down the trace tree.
+- **Tool span**: `attributes.input.value` has tool input, `attributes.output.value` has tool result. Not typically where prompts live.
+
+### Performance Signal Columns
+
+These columns carry the feedback data used for optimization:
+
+| Column pattern | Source | What it tells you |
+|---------------|--------|-------------------|
+| `annotation.<name>.label` | Human reviewers | Categorical grade (e.g., `correct`, `incorrect`, `partial`) |
+| `annotation.<name>.score` | Human reviewers | Numeric quality score (e.g., 0.0 - 1.0) |
+| `annotation.<name>.text` | Human reviewers | Freeform explanation of the grade |
+| `eval.<name>.label` | LLM-as-judge evals | Automated categorical assessment |
+| `eval.<name>.score` | LLM-as-judge evals | Automated numeric score |
+| `eval.<name>.explanation` | LLM-as-judge evals | Why the eval gave that score -- **most valuable for optimization** |
+| `attributes.input.value` | Trace data | What went into the LLM |
+| `attributes.output.value` | Trace data | What the LLM produced |
+| `{experiment_name}.output` | Experiment runs | Output from a specific experiment |
+
+## Prerequisites
+
+Proceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.
+
+If an `ax` command fails, troubleshoot based on the error:
+- `command not found` or version error → see references/ax-setup.md
+- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong: check `.env` for `ARIZE_API_KEY` and use it to create/update the profile via references/ax-profiles.md. If `.env` has no key either, ask the user for their Arize API key (https://app.arize.com/admin > API Keys)
+- Space ID unknown → check `.env` for `ARIZE_SPACE_ID`, or run `ax spaces list -o json`, or ask the user
+- Project unclear → check `.env` for `ARIZE_DEFAULT_PROJECT`, or ask, or run `ax projects list -o json --limit 100` and present as selectable options
+- LLM provider call fails (missing OPENAI_API_KEY / ANTHROPIC_API_KEY) → check `.env`, load if present, otherwise ask the user
+
+## Phase 1: Extract the Current Prompt
+
+### Find LLM spans containing prompts
+
+```bash
+# List LLM spans (where prompts live)
+ax spans list PROJECT_ID --filter "attributes.openinference.span.kind = 'LLM'" --limit 10
+
+# Filter by model
+ax spans list PROJECT_ID --filter "attributes.llm.model_name = 'gpt-4o'" --limit 10
+
+# Filter by span name (e.g., a specific LLM call)
+ax spans list PROJECT_ID --filter "name = 'ChatCompletion'" --limit 10
+```
+
+### Export a trace to inspect prompt structure
+
+```bash
+# Export all spans in a trace
+ax spans export --trace-id TRACE_ID --project PROJECT_ID
+
+# Export a single span
+ax spans export --span-id SPAN_ID --project PROJECT_ID
+```
+
+### Extract prompts from exported JSON
+
+```bash
+# Extract structured chat messages (system + user + assistant)
+jq '.[0] | {
+  messages: .attributes.llm.input_messages,
+  model: .attributes.llm.model_name
+}' trace_*/spans.json
+
+# Extract the system prompt specifically
+jq '[.[] | select(.attributes.llm.input_messages.roles[]? == "system")] | .[0].attributes.llm.input_messages' trace_*/spans.json
+
+# Extract prompt template and variables
+jq '.[0].attributes.llm.prompt_template' trace_*/spans.json
+
+# Extract from input.value (fallback for non-structured prompts)
+jq '.[0].attributes.input.value' trace_*/spans.json
+```
+
+### Reconstruct the prompt as messages
+
+Once you have the span data, reconstruct the prompt as a messages array:
+
+```json
+[
+  {"role": "system", "content": "You are a helpful assistant that..."},
+  {"role": "user", "content": "Given {input}, answer the question: {question}"}
+]
+```
+
+If the span has `attributes.llm.prompt_template.template`, the prompt uses variables. Preserve these placeholders (`{variable}` or `{{variable}}`) -- they are substituted at runtime.
+
+## Phase 2: Gather Performance Data
+
+### From traces (production feedback)
+
+```bash
+# Find error spans -- these indicate prompt failures
+ax spans list PROJECT_ID \
+  --filter "status_code = 'ERROR' AND attributes.openinference.span.kind = 'LLM'" \
+  --limit 20
+
+# Find spans with low eval scores
+ax spans list PROJECT_ID \
+  --filter "annotation.correctness.label = 'incorrect'" \
+  --limit 20
+
+# Find spans with high latency (may indicate overly complex prompts)
+ax spans list PROJECT_ID \
+  --filter "attributes.openinference.span.kind = 'LLM' AND latency_ms > 10000" \
+  --limit 20
+
+# Export error traces for detailed inspection
+ax spans export --trace-id TRACE_ID --project PROJECT_ID
+```
+
+### From datasets and experiments
+
+```bash
+# Export a dataset (ground truth examples)
+ax datasets export DATASET_ID
+# -> dataset_*/examples.json
+
+# Export experiment results (what the LLM produced)
+ax experiments export EXPERIMENT_ID
+# -> experiment_*/runs.json
+```
+
+### Merge dataset + experiment for analysis
+
+Join the two files by `example_id` to see inputs alongside outputs and evaluations:
+
+```bash
+# Count examples and runs
+jq 'length' dataset_*/examples.json
+jq 'length' experiment_*/runs.json
+
+# View a single joined record
+jq -s '
+  .[0] as $dataset |
+  .[1][0] as $run |
+  ($dataset[] | select(.id == $run.example_id)) as $example |
+  {
+    input: $example,
+    output: $run.output,
+    evaluations: $run.evaluations
+  }
+' dataset_*/examples.json experiment_*/runs.json
+
+# Find failed examples (where eval score < threshold)
+jq '[.[] | select(.evaluations.correctness.score < 0.5)]' experiment_*/runs.json
+```
+
+### Identify what to optimize
+
+Look for patterns across failures:
+
+1. **Compare outputs to ground truth**: Where does the LLM output differ from expected?
+2. **Read eval explanations**: `eval.*.explanation` tells you WHY something failed
+3. **Check annotation text**: Human feedback describes specific issues
+4. **Look for verbosity mismatches**: If outputs are too long/short vs ground truth
+5. **Check format compliance**: Are outputs in the expected format?
+
+## Phase 3: Optimize the Prompt
+
+### The Optimization Meta-Prompt
+
+Use this template to generate an improved version of the prompt. Fill in the three placeholders and send it to your LLM (GPT-4o, Claude, etc.):
+
+````
+You are an expert in prompt optimization. Given the original baseline prompt
+and the associated performance data (inputs, outputs, evaluation labels, and
+explanations), generate a revised version that improves results.
+
+ORIGINAL BASELINE PROMPT
+========================
+
+{PASTE_ORIGINAL_PROMPT_HERE}
+
+========================
+
+PERFORMANCE DATA
+================
+
+The following records show how the current prompt performed. Each record
+includes the input, the LLM output, and evaluation feedback:
+
+{PASTE_RECORDS_HERE}
+
+================
+
+HOW TO USE THIS DATA
+
+1. Compare outputs: Look at what the LLM generated vs what was expected
+2. Review eval scores: Check which examples scored poorly and why
+3. Examine annotations: Human feedback shows what worked and what didn't
+4. Identify patterns: Look for common issues across multiple examples
+5. Focus on failures: The rows where the output DIFFERS from the expected
+   value are the ones that need fixing
+
+ALIGNMENT STRATEGY
+
+- If outputs have extra text or reasoning not present in the ground truth,
+  remove instructions that encourage explanation or verbose reasoning
+- If outputs are missing information, add instructions to include it
+- If outputs are in the wrong format, add explicit format instructions
+- Focus on the rows where the output differs from the target -- these are
+  the failures to fix
+
+RULES
+
+Maintain Structure:
+- Use the same template variables as the current prompt ({var} or {{var}})
+- Don't change sections that are already working
+- Preserve the exact return format instructions from the original prompt
+
+Avoid Overfitting:
+- DO NOT copy examples verbatim into the prompt
+- DO NOT quote specific test data outputs exactly
+- INSTEAD: Extract the ESSENCE of what makes good vs bad outputs
+- INSTEAD: Add general guidelines and principles
+- INSTEAD: If adding few-shot examples, create SYNTHETIC examples that
+  demonstrate the principle, not real data from above
+
+Goal: Create a prompt that generalizes well to new inputs, not one that
+memorizes the test data.
+
+OUTPUT FORMAT
+
+Return the revised prompt as a JSON array of messages:
+
+[
+  {"role": "system", "content": "..."},
+  {"role": "user", "content": "..."}
+]
+
+Also provide a brief reasoning section (bulleted list) explaining:
+- What problems you found
+- How the revised prompt addresses each one
+````
+
+### Preparing the performance data
+
+Format the records as a JSON array before pasting into the template:
+
+```bash
+# From dataset + experiment: join and select relevant columns
+jq -s '
+  .[0] as $ds |
+  [.[1][] | . as $run |
+    ($ds[] | select(.id == $run.example_id)) as $ex |
+    {
+      input: $ex.input,
+      expected: $ex.expected_output,
+      actual_output: $run.output,
+      eval_score: $run.evaluations.correctness.score,
+      eval_label: $run.evaluations.correctness.label,
+      eval_explanation: $run.evaluations.correctness.explanation
+    }
+  ]
+' dataset_*/examples.json experiment_*/runs.json
+
+# From exported spans: extract input/output pairs with annotations
+jq '[.[] | select(.attributes.openinference.span.kind == "LLM") | {
+  input: .attributes.input.value,
+  output: .attributes.output.value,
+  status: .status_code,
+  model: .attributes.llm.model_name
+}]' trace_*/spans.json
+```
+
+### Applying the revised prompt
+
+After the LLM returns the revised messages array:
+
+1. Compare the original and revised prompts side by side
+2. Verify all template variables are preserved
+3. Check that format instructions are intact
+4. Test on a few examples before full deployment
+
+## Phase 4: Iterate
+
+### The optimization loop
+
+```
+1. Extract prompt    -> Phase 1 (once)
+2. Run experiment    -> ax experiments create ...
+3. Export results    -> ax experiments export EXPERIMENT_ID
+4. Analyze failures  -> jq to find low scores
+5. Run meta-prompt   -> Phase 3 with new failure data
+6. Apply revised prompt
+7. Repeat from step 2
+```
+
+### Measure improvement
+
+```bash
+# Compare scores across experiments
+# Experiment A (baseline)
+jq '[.[] | .evaluations.correctness.score] | add / length' experiment_a/runs.json
+
+# Experiment B (optimized)
+jq '[.[] | .evaluations.correctness.score] | add / length' experiment_b/runs.json
+
+# Find examples that flipped from fail to pass
+jq -s '
+  [.[0][] | select(.evaluations.correctness.label == "incorrect")] as $fails |
+  [.[1][] | select(.evaluations.correctness.label == "correct") |
+    select(.example_id as $id | $fails | any(.example_id == $id))
+  ] | length
+' experiment_a/runs.json experiment_b/runs.json
+```
+
+### A/B compare two prompts
+
+1. Create two experiments against the same dataset, each using a different prompt version
+2. Export both: `ax experiments export EXP_A` and `ax experiments export EXP_B`
+3. Compare average scores, failure rates, and specific example flips
+4. Check for regressions -- examples that passed with prompt A but fail with prompt B
+
+## Prompt Engineering Best Practices
+
+Apply these when writing or revising prompts:
+
+| Technique | When to apply | Example |
+|-----------|--------------|---------|
+| Clear, detailed instructions | Output is vague or off-topic | "Classify the sentiment as exactly one of: positive, negative, neutral" |
+| Instructions at the beginning | Model ignores later instructions | Put the task description before examples |
+| Step-by-step breakdowns | Complex multi-step processes | "First extract entities, then classify each, then summarize" |
+| Specific personas | Need consistent style/tone | "You are a senior financial analyst writing for institutional investors" |
+| Delimiter tokens | Sections blend together | Use `---`, `###`, or XML tags to separate input from instructions |
+| Few-shot examples | Output format needs clarification | Show 2-3 synthetic input/output pairs |
+| Output length specifications | Responses are too long or short | "Respond in exactly 2-3 sentences" |
+| Reasoning instructions | Accuracy is critical | "Think step by step before answering" |
+| "I don't know" guidelines | Hallucination is a risk | "If the answer is not in the provided context, say 'I don't have enough information'" |
+
+### Variable preservation
+
+When optimizing prompts that use template variables:
+
+- **Single braces** (`{variable}`): Python f-string / Jinja style. Most common in Arize.
+- **Double braces** (`{{variable}}`): Mustache style. Used when the framework requires it.
+- Never add or remove variable placeholders during optimization
+- Never rename variables -- the runtime substitution depends on exact names
+- If adding few-shot examples, use literal values, not variable placeholders
+
+## Workflows
+
+### Optimize a prompt from a failing trace
+
+1. Find failing traces:
+   ```bash
+   ax traces list PROJECT_ID --filter "status_code = 'ERROR'" --limit 5
+   ```
+2. Export the trace:
+   ```bash
+   ax spans export --trace-id TRACE_ID --project PROJECT_ID
+   ```
+3. Extract the prompt from the LLM span:
+   ```bash
+   jq '[.[] | select(.attributes.openinference.span.kind == "LLM")][0] | {
+     messages: .attributes.llm.input_messages,
+     template: .attributes.llm.prompt_template,
+     output: .attributes.output.value,
+     error: .attributes.exception.message
+   }' trace_*/spans.json
+   ```
+4. Identify what failed from the error message or output
+5. Fill in the optimization meta-prompt (Phase 3) with the prompt and error context
+6. Apply the revised prompt
+
+### Optimize using a dataset and experiment
+
+1. Find the dataset and experiment:
+   ```bash
+   ax datasets list
+   ax experiments list --dataset-id DATASET_ID
+   ```
+2. Export both:
+   ```bash
+   ax datasets export DATASET_ID
+   ax experiments export EXPERIMENT_ID
+   ```
+3. Prepare the joined data for the meta-prompt
+4. Run the optimization meta-prompt
+5. Create a new experiment with the revised prompt to measure improvement
+
+### Debug a prompt that produces wrong format
+
+1. Export spans where the output format is wrong:
+   ```bash
+   ax spans list PROJECT_ID \
+     --filter "attributes.openinference.span.kind = 'LLM' AND annotation.format.label = 'incorrect'" \
+     --limit 10 -o json > bad_format.json
+   ```
+2. Look at what the LLM is producing vs what was expected
+3. Add explicit format instructions to the prompt (JSON schema, examples, delimiters)
+4. Common fix: add a few-shot example showing the exact desired output format
+
+### Reduce hallucination in a RAG prompt
+
+1. Find traces where the model hallucinated:
+   ```bash
+   ax spans list PROJECT_ID \
+     --filter "annotation.faithfulness.label = 'unfaithful'" \
+     --limit 20
+   ```
+2. Export and inspect the retriever + LLM spans together:
+   ```bash
+   ax spans export --trace-id TRACE_ID --project PROJECT_ID
+   jq '[.[] | {kind: .attributes.openinference.span.kind, name, input: .attributes.input.value, output: .attributes.output.value}]' trace_*/spans.json
+   ```
+3. Check if the retrieved context actually contained the answer
+4. Add grounding instructions to the system prompt: "Only use information from the provided context. If the answer is not in the context, say so."
+
+## Troubleshooting
+
+| Problem | Solution |
+|---------|----------|
+| `ax: command not found` | See references/ax-setup.md |
+| `No profile found` | No profile is configured. See references/ax-profiles.md to create one. |
+| No `input_messages` on span | Check span kind -- Chain/Agent spans store prompts on child LLM spans, not on themselves |
+| Prompt template is `null` | Not all instrumentations emit `prompt_template`. Use `input_messages` or `input.value` instead |
+| Variables lost after optimization | Verify the revised prompt preserves all `{var}` placeholders from the original |
+| Optimization makes things worse | Check for overfitting -- the meta-prompt may have memorized test data. Ensure few-shot examples are synthetic |
+| No eval/annotation columns | Run evaluations first (via Arize UI or SDK), then re-export |
+| Experiment output column not found | The column name is `{experiment_name}.output` -- check exact experiment name via `ax experiments get` |
+| `jq` errors on span JSON | Ensure you're targeting the correct file path (e.g., `trace_*/spans.json`) |
--- a/plugins/arize-ax/skills/arize-prompt-optimization/references/ax-profiles.md
+++ b/plugins/arize-ax/skills/arize-prompt-optimization/references/ax-profiles.md
@@ -0,0 +1,115 @@
+# ax Profile Setup
+
+Consult this when authentication fails (401, missing profile, missing API key). Do NOT run these checks proactively.
+
+Use this when there is no profile, or a profile has incorrect settings (wrong API key, wrong region, etc.).
+
+## 1. Inspect the current state
+
+```bash
+ax profiles show
+```
+
+Look at the output to understand what's configured:
+- `API Key: (not set)` or missing → key needs to be created/updated
+- No profile output or "No profiles found" → no profile exists yet
+- Connected but getting `401 Unauthorized` → key is wrong or expired
+- Connected but wrong endpoint/region → region needs to be updated
+
+## 2. Fix a misconfigured profile
+
+If a profile exists but one or more settings are wrong, patch only what's broken.
+
+**Never pass a raw API key value as a flag.** Always reference it via the `ARIZE_API_KEY` environment variable. If the variable is not already set in the shell, instruct the user to set it first, then run the command:
+
+```bash
+# If ARIZE_API_KEY is already exported in the shell:
+ax profiles update --api-key $ARIZE_API_KEY
+
+# Fix the region (no secret involved — safe to run directly)
+ax profiles update --region us-east-1b
+
+# Fix both at once
+ax profiles update --api-key $ARIZE_API_KEY --region us-east-1b
+```
+
+`update` only changes the fields you specify — all other settings are preserved. If no profile name is given, the active profile is updated.
+
+## 3. Create a new profile
+
+If no profile exists, or if the existing profile needs to point to a completely different setup (different org, different region):
+
+**Always reference the key via `$ARIZE_API_KEY`, never inline a raw value.**
+
+```bash
+# Requires ARIZE_API_KEY to be exported in the shell first
+ax profiles create --api-key $ARIZE_API_KEY
+
+# Create with a region
+ax profiles create --api-key $ARIZE_API_KEY --region us-east-1b
+
+# Create a named profile
+ax profiles create work --api-key $ARIZE_API_KEY --region us-east-1b
+```
+
+To use a named profile with any `ax` command, add `-p NAME`:
+```bash
+ax spans export PROJECT_ID -p work
+```
+
+## 4. Getting the API key
+
+**Never ask the user to paste their API key into the chat. Never log, echo, or display an API key value.**
+
+If `ARIZE_API_KEY` is not already set, instruct the user to export it in their shell:
+
+```bash
+export ARIZE_API_KEY="..."   # user pastes their key here in their own terminal
+```
+
+They can find their key at https://app.arize.com/admin > API Keys. Recommend they create a **scoped service key** (not a personal user key) — service keys are not tied to an individual account and are safer for programmatic use. Keys are space-scoped — make sure they copy the key for the correct space.
+
+Once the user confirms the variable is set, proceed with `ax profiles create --api-key $ARIZE_API_KEY` or `ax profiles update --api-key $ARIZE_API_KEY` as described above.
+
+## 5. Verify
+
+After any create or update:
+
+```bash
+ax profiles show
+```
+
+Confirm the API key and region are correct, then retry the original command.
+
+## Space ID
+
+There is no profile flag for space ID. Save it as an environment variable:
+
+**macOS/Linux** — add to `~/.zshrc` or `~/.bashrc`:
+```bash
+export ARIZE_SPACE_ID="U3BhY2U6..."
+```
+Then `source ~/.zshrc` (or restart terminal).
+
+**Windows (PowerShell):**
+```powershell
+[System.Environment]::SetEnvironmentVariable('ARIZE_SPACE_ID', 'U3BhY2U6...', 'User')
+```
+Restart terminal for it to take effect.
+
+## Save Credentials for Future Use
+
+At the **end of the session**, if the user manually provided any credentials during this conversation **and** those values were NOT already loaded from a saved profile or environment variable, offer to save them.
+
+**Skip this entirely if:**
+- The API key was already loaded from an existing profile or `ARIZE_API_KEY` env var
+- The space ID was already set via `ARIZE_SPACE_ID` env var
+- The user only used base64 project IDs (no space ID was needed)
+
+**How to offer:** Use **AskQuestion**: *"Would you like to save your Arize credentials so you don't have to enter them next time?"* with options `"Yes, save them"` / `"No thanks"`.
+
+**If the user says yes:**
+
+1. **API key** — Run `ax profiles show` to check the current state. Then run `ax profiles create --api-key $ARIZE_API_KEY` or `ax profiles update --api-key $ARIZE_API_KEY` (the key must already be exported as an env var — never pass a raw key value).
+
+2. **Space ID** — See the Space ID section above to persist it as an environment variable.
--- a/plugins/arize-ax/skills/arize-prompt-optimization/references/ax-setup.md
+++ b/plugins/arize-ax/skills/arize-prompt-optimization/references/ax-setup.md
@@ -0,0 +1,38 @@
+# ax CLI — Troubleshooting
+
+Consult this only when an `ax` command fails. Do NOT run these checks proactively.
+
+## Check version first
+
+If `ax` is installed (not `command not found`), always run `ax --version` before investigating further. The version must be `0.8.0` or higher — many errors are caused by an outdated install. If the version is too old, see **Version too old** below.
+
+## `ax: command not found`
+
+**macOS/Linux:**
+1. Check common locations: `~/.local/bin/ax`, `~/Library/Python/*/bin/ax`
+2. Install: `uv tool install arize-ax-cli` (preferred), `pipx install arize-ax-cli`, or `pip install arize-ax-cli`
+3. Add to PATH if needed: `export PATH="$HOME/.local/bin:$PATH"`
+
+**Windows (PowerShell):**
+1. Check: `Get-Command ax` or `where.exe ax`
+2. Common locations: `%APPDATA%\Python\Scripts\ax.exe`, `%LOCALAPPDATA%\Programs\Python\Python*\Scripts\ax.exe`
+3. Install: `pip install arize-ax-cli`
+4. Add to PATH: `$env:PATH = "$env:APPDATA\Python\Scripts;$env:PATH"`
+
+## Version too old (below 0.8.0)
+
+Upgrade: `uv tool install --force --reinstall arize-ax-cli`, `pipx upgrade arize-ax-cli`, or `pip install --upgrade arize-ax-cli`
+
+## SSL/certificate error
+
+- macOS: `export SSL_CERT_FILE=/etc/ssl/cert.pem`
+- Linux: `export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`
+- Fallback: `export SSL_CERT_FILE=$(python -c "import certifi; print(certifi.where())")`
+
+## Subcommand not recognized
+
+Upgrade ax (see above) or use the closest available alternative.
+
+## Still failing
+
+Stop and ask the user for help.