Files
agent-evals/LEADERBOARD.md
2026-02-06 19:35:36 -05:00

950 B

Leaderboard

Overall Rankings

  1. pi - Kimi K2.5 - UI: 9/10

    • Fixes: Files List, Editor Scrolling, Duplicate .md Name
    Screenshot ![pi-kimi-k2.5](../../../raw/branch/eval/pi-kimi-k2.5/screenshot.png)
  2. pi - Qwen3 Coder Next (80B) - UI: 8/10

    • Fixes: New File, Markdown Styling
    Screenshot ![pi-qwen3-coder-next-80b](../../../raw/branch/eval/pi-qwen3-coder-next-80b/screenshot.png)
  3. pi - GLM4.7 - UI: 7/10

    • Fixes: Files List, Markdown Styling, Delete Failure
    Screenshot ![pi-glm4.7](../../../raw/branch/eval/pi-glm4.7/screenshot.png)

By Parameters

< 100B (Local): Qwen3 Coder Next (80B)
> 100B (Hosted): Kimi K2.5