Files
agent-evals/NOTES.md
Evan Reichard cd6eea2a7b docs: add model rankings and observations
- pi-qwen-coder-next-80b: Near one-shot success with UI advantages, minor routing issues
- pi-glm4.7-flash: Dependency issues, legacy packages, average UI
- pi-devstral-small-2: Runtime issues
2026-02-05 19:45:55 -05:00

15 lines
405 B
Markdown

### Rankings
1. eval/pi-qwen-coder-next-80b
- Almost one shot
- Nicest UI
- Invalid route (first fix commit) - very simple fix
- Not displaying content (second fix commit) - simple fix
2. eval/pi-glm4.7-flash
- FE wouldnt run on first try
- Had to downgrade a bunch of deps so it would run
- Used legacy packages
- UI is meh
3. eval/pi-devstral-small-2
- Wouldnt run