Files
agent-evals/RESULTS.md
2026-02-06 17:22:05 -05:00

46 lines
843 B
Markdown

## Description
## Agentic Tools
- [pi](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent)
- [opencode](https://github.com/anomalyco/opencode)
- [claude-code](https://github.com/anthropics/claude-code)
## Grading
Purely opinion based, but based on
- One-shot performance
- Follow up fixes
- UI design
## Rankings
### Overall
1. eval/pi-kimi-k2.5
2. eval/pi-qwen3-coder-next-80b
3. eval/pi-glm4.7
### Parameters: < 100B (Local)
1. eval/pi-qwen3-coder-next-80b
- UI: 8/10
- Fixes:
- New File
- Markdown Styling
### Parameters: > 100B (Hosted)
1. eval/pi-kimi-k2.5
- UI: 9/10
- Fixes:
- Files List
- Editor Scrolling
- Duplicate `.md` Name
2. eval/pi-glm4.7
- UI: 7/10
- Fixes:
- Files List
- Markdown Styling
- Delete Failure