Files
agent-evals/RESULTS.md
2026-02-06 17:22:05 -05:00

843 B

Description

Agentic Tools

Grading

Purely opinion based, but based on

  • One-shot performance
  • Follow up fixes
  • UI design

Rankings

Overall

  1. eval/pi-kimi-k2.5
  2. eval/pi-qwen3-coder-next-80b
  3. eval/pi-glm4.7

Parameters: < 100B (Local)

  1. eval/pi-qwen3-coder-next-80b
    • UI: 8/10
    • Fixes:
      • New File
      • Markdown Styling

Parameters: > 100B (Hosted)

  1. eval/pi-kimi-k2.5
    • UI: 9/10
    • Fixes:
      • Files List
      • Editor Scrolling
      • Duplicate .md Name
  2. eval/pi-glm4.7
    • UI: 7/10
    • Fixes:
      • Files List
      • Markdown Styling
      • Delete Failure