Files
agent-evals/NOTES.md
2026-02-06 12:53:40 -05:00

1.2 KiB

Description

Agentic Tools

Grading

Purely opinion based, but based on

  • One-shot performance
  • Follow up fixes
  • UI design

Rankings

Overall

  1. eval/pi-kimi-k2.5
  2. eval/pi-qwen-coder-next-80b
  3. eval/pi-glm4.7
  4. eval/pi-glm4.7-flash

Parameters: < 100B (Local)

  1. eval/pi-qwen-coder-next-80b
    • UI: ?/10
    • Fixes:
      • Invalid route (first fix commit) - very simple fix
      • Not displaying content (second fix commit) - simple fix
  2. eval/pi-glm4.7-flash
    • UI: ?/10
    • Fixes:
      • FE wouldnt run on first try
      • Had to downgrade a bunch of deps so it would run
      • Used legacy packages
  3. eval/pi-devstral-small-2
    • UI: 2/10
    • Fixes:
      • ?

Parameters: > 100B (Hosted)

  1. eval/pi-kimi-k2.5
    • UI: 9/10
    • Fixes:
      • Files List
      • Editor Scrolling
      • Duplicate .md
  2. eval/pi-glm4.7
    • UI: 7/10
    • Fixes:
      • files List
      • Markdown Styling
      • Delete Failure