1.2 KiB
1.2 KiB
Description
Agentic Tools
Grading
Purely opinion based, but based on
- One-shot performance
- Follow up fixes
- UI design
Rankings
Overall
- eval/pi-kimi-k2.5
- eval/pi-qwen-coder-next-80b
- eval/pi-glm4.7
- eval/pi-glm4.7-flash
Parameters: < 100B (Local)
- eval/pi-qwen-coder-next-80b
- UI: ?/10
- Fixes:
- Invalid route (first fix commit) - very simple fix
- Not displaying content (second fix commit) - simple fix
- eval/pi-glm4.7-flash
- UI: ?/10
- Fixes:
- FE wouldnt run on first try
- Had to downgrade a bunch of deps so it would run
- Used legacy packages
- eval/pi-devstral-small-2
- UI: 2/10
- Fixes:
- ?
Parameters: > 100B (Hosted)
- eval/pi-kimi-k2.5
- UI: 9/10
- Fixes:
- Files List
- Editor Scrolling
- Duplicate
.md
- eval/pi-glm4.7
- UI: 7/10
- Fixes:
- files List
- Markdown Styling
- Delete Failure