mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-13 03:35:55 +00:00
chore: publish from staged
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# Model Selection
|
||||
|
||||
Error analysis first, model changes last.
|
||||
|
||||
## Decision Tree
|
||||
|
||||
```
|
||||
Performance Issue?
|
||||
│
|
||||
▼
|
||||
Error analysis suggests model problem?
|
||||
NO → Fix prompts, retrieval, tools
|
||||
YES → Is it a capability gap?
|
||||
YES → Consider model change
|
||||
NO → Fix the actual problem
|
||||
```
|
||||
|
||||
## Judge Model Selection
|
||||
|
||||
| Principle | Action |
|
||||
| --------- | ------ |
|
||||
| Start capable | Use gpt-4o first |
|
||||
| Optimize later | Test cheaper after criteria stable |
|
||||
| Same model OK | Judge does different task |
|
||||
|
||||
```python
|
||||
# Start with capable model
|
||||
judge = ClassificationEvaluator(
|
||||
llm=LLM(provider="openai", model="gpt-4o"),
|
||||
...
|
||||
)
|
||||
|
||||
# After validation, test cheaper
|
||||
judge_cheap = ClassificationEvaluator(
|
||||
llm=LLM(provider="openai", model="gpt-4o-mini"),
|
||||
...
|
||||
)
|
||||
# Compare TPR/TNR on same test set
|
||||
```
|
||||
|
||||
## Don't Model Shop
|
||||
|
||||
```python
|
||||
# BAD
|
||||
for model in ["gpt-4o", "claude-3", "gemini-pro"]:
|
||||
results = run_experiment(dataset, task, model)
|
||||
|
||||
# GOOD
|
||||
failures = analyze_errors(results)
|
||||
# "Ignores context" → Fix prompt
|
||||
# "Can't do math" → Maybe try better model
|
||||
```
|
||||
|
||||
## When Model Change Is Warranted
|
||||
|
||||
- Failures persist after prompt optimization
|
||||
- Capability gaps (reasoning, math, code)
|
||||
- Error analysis confirms model limitation
|
||||
Reference in New Issue
Block a user