chore: publish from staged

This commit is contained in:
github-actions[bot]
2026-04-13 01:02:36 +00:00
parent e37cd3123f
commit 2f4953242f
467 changed files with 97528 additions and 276 deletions

View File

@@ -0,0 +1,40 @@
# Evaluators: Overview
When and how to build automated evaluators.
## Decision Framework
```
Should I Build an Evaluator?
Can I fix it with a prompt change?
YES → Fix the prompt first
NO → Is this a recurring issue?
YES → Build evaluator
NO → Add to watchlist
```
**Don't automate prematurely.** Many issues are simple prompt fixes.
## Evaluator Requirements
1. **Clear criteria** - Specific, not "Is it good?"
2. **Labeled test set** - 100+ examples with human labels
3. **Measured accuracy** - Know TPR/TNR before deploying
## Evaluator Lifecycle
1. **Discover** - Error analysis reveals pattern
2. **Design** - Define criteria and test cases
3. **Implement** - Build code or LLM evaluator
4. **Calibrate** - Validate against human labels
5. **Deploy** - Add to experiment/CI pipeline
6. **Monitor** - Track accuracy over time
7. **Maintain** - Update as product evolves
## What NOT to Automate
- **Rare issues** - <5 instances? Watchlist, don't build
- **Quick fixes** - Fixable by prompt change? Fix it
- **Evolving criteria** - Stabilize definition first