mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-13 11:45:56 +00:00
chore: publish from staged
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
# Production: Overview
|
||||
|
||||
CI/CD evals vs production monitoring - complementary approaches.
|
||||
|
||||
## Two Evaluation Modes
|
||||
|
||||
| Aspect | CI/CD Evals | Production Monitoring |
|
||||
| ------ | ----------- | -------------------- |
|
||||
| **When** | Pre-deployment | Post-deployment, ongoing |
|
||||
| **Data** | Fixed dataset | Sampled traffic |
|
||||
| **Goal** | Prevent regression | Detect drift |
|
||||
| **Response** | Block deploy | Alert & analyze |
|
||||
|
||||
## CI/CD Evaluations
|
||||
|
||||
```python
|
||||
# Fast, deterministic checks
|
||||
ci_evaluators = [
|
||||
has_required_format,
|
||||
no_pii_leak,
|
||||
safety_check,
|
||||
regression_test_suite,
|
||||
]
|
||||
|
||||
# Small but representative dataset (~100 examples)
|
||||
run_experiment(ci_dataset, task, ci_evaluators)
|
||||
```
|
||||
|
||||
Set thresholds: regression=0.95, safety=1.0, format=0.98.
|
||||
|
||||
## Production Monitoring
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
from phoenix.client import Client
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
client = Client()
|
||||
|
||||
# Sample recent traces (last hour)
|
||||
traces = client.traces.get_traces(
|
||||
project_identifier="my-app",
|
||||
start_time=datetime.now() - timedelta(hours=1),
|
||||
include_spans=True,
|
||||
limit=100,
|
||||
)
|
||||
|
||||
# Run evaluators on sampled traffic
|
||||
for trace in traces:
|
||||
results = run_evaluators_async(trace, production_evaluators)
|
||||
if any(r["score"] < 0.5 for r in results):
|
||||
alert_on_failure(trace, results)
|
||||
```
|
||||
|
||||
### TypeScript
|
||||
|
||||
```typescript
|
||||
import { getTraces } from "@arizeai/phoenix-client/traces";
|
||||
import { getSpans } from "@arizeai/phoenix-client/spans";
|
||||
|
||||
// Sample recent traces (last hour)
|
||||
const { traces } = await getTraces({
|
||||
project: { projectName: "my-app" },
|
||||
startTime: new Date(Date.now() - 60 * 60 * 1000),
|
||||
includeSpans: true,
|
||||
limit: 100,
|
||||
});
|
||||
|
||||
// Or sample spans directly for evaluation
|
||||
const { spans } = await getSpans({
|
||||
project: { projectName: "my-app" },
|
||||
startTime: new Date(Date.now() - 60 * 60 * 1000),
|
||||
limit: 100,
|
||||
});
|
||||
|
||||
// Run evaluators on sampled traffic
|
||||
for (const span of spans) {
|
||||
const results = await runEvaluators(span, productionEvaluators);
|
||||
if (results.some((r) => r.score < 0.5)) {
|
||||
await alertOnFailure(span, results);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Prioritize: errors → negative feedback → random sample.
|
||||
|
||||
## Feedback Loop
|
||||
|
||||
```
|
||||
Production finds failure → Error analysis → Add to CI dataset → Prevents future regression
|
||||
```
|
||||
Reference in New Issue
Block a user