chore: update results
This commit is contained in:
14
NOTES.md
14
NOTES.md
@@ -1,14 +0,0 @@
|
|||||||
### Rankings
|
|
||||||
|
|
||||||
1. eval/pi-qwen-coder-next-80b
|
|
||||||
- Almost one shot
|
|
||||||
- Nicest UI
|
|
||||||
- Invalid route (first fix commit) - very simple fix
|
|
||||||
- Not displaying content (second fix commit) - simple fix
|
|
||||||
2. eval/pi-glm4.7-flash
|
|
||||||
- FE wouldnt run on first try
|
|
||||||
- Had to downgrade a bunch of deps so it would run
|
|
||||||
- Used legacy packages
|
|
||||||
- UI is meh
|
|
||||||
3. eval/pi-devstral-small-2
|
|
||||||
- Wouldnt run
|
|
||||||
45
RESULTS.md
Normal file
45
RESULTS.md
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
## Description
|
||||||
|
|
||||||
|
## Agentic Tools
|
||||||
|
|
||||||
|
- [pi](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent)
|
||||||
|
- [opencode](https://github.com/anomalyco/opencode)
|
||||||
|
- [claude-code](https://github.com/anthropics/claude-code)
|
||||||
|
|
||||||
|
## Grading
|
||||||
|
|
||||||
|
Purely opinion based, but based on
|
||||||
|
- One-shot performance
|
||||||
|
- Follow up fixes
|
||||||
|
- UI design
|
||||||
|
|
||||||
|
## Rankings
|
||||||
|
|
||||||
|
### Overall
|
||||||
|
|
||||||
|
1. eval/pi-kimi-k2.5
|
||||||
|
2. eval/pi-qwen3-coder-next-80b
|
||||||
|
3. eval/pi-glm4.7
|
||||||
|
|
||||||
|
### Parameters: < 100B (Local)
|
||||||
|
|
||||||
|
1. eval/pi-qwen3-coder-next-80b
|
||||||
|
- UI: 8/10
|
||||||
|
- Fixes:
|
||||||
|
- New File
|
||||||
|
- Markdown Styling
|
||||||
|
|
||||||
|
### Parameters: > 100B (Hosted)
|
||||||
|
|
||||||
|
1. eval/pi-kimi-k2.5
|
||||||
|
- UI: 9/10
|
||||||
|
- Fixes:
|
||||||
|
- Files List
|
||||||
|
- Editor Scrolling
|
||||||
|
- Duplicate `.md` Name
|
||||||
|
2. eval/pi-glm4.7
|
||||||
|
- UI: 7/10
|
||||||
|
- Fixes:
|
||||||
|
- Files List
|
||||||
|
- Markdown Styling
|
||||||
|
- Delete Failure
|
||||||
Reference in New Issue
Block a user