Files
agent-evals/NOTES.md
Evan Reichard cd6eea2a7b docs: add model rankings and observations
- pi-qwen-coder-next-80b: Near one-shot success with UI advantages, minor routing issues
- pi-glm4.7-flash: Dependency issues, legacy packages, average UI
- pi-devstral-small-2: Runtime issues
2026-02-05 19:45:55 -05:00

405 B

Rankings

  1. eval/pi-qwen-coder-next-80b
    • Almost one shot
    • Nicest UI
    • Invalid route (first fix commit) - very simple fix
    • Not displaying content (second fix commit) - simple fix
  2. eval/pi-glm4.7-flash
    • FE wouldnt run on first try
    • Had to downgrade a bunch of deps so it would run
    • Used legacy packages
    • UI is meh
  3. eval/pi-devstral-small-2
    • Wouldnt run