Initial: branch-oriented eval framework
This commit is contained in:
53
README.md
Normal file
53
README.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# LLM Evaluation Framework
|
||||
|
||||
Evaluate different LLM models and agentic tools (opencode, claude code, etc.) in controlled environments using git branches.
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
# Direnv
|
||||
direnv allow
|
||||
|
||||
# Development shell
|
||||
nix develop
|
||||
```
|
||||
|
||||
## Running Evaluations
|
||||
|
||||
1. **Start evaluation:**
|
||||
```bash
|
||||
./scripts/start-eval.sh <eval-name>
|
||||
```
|
||||
This creates a new orphan branch `eval/<eval-name>`, sets up the flake environment, and starts opencode.
|
||||
Example: `./scripts/start-eval.sh opencode-glm47`
|
||||
|
||||
2. **Run your evaluation:**
|
||||
- Set up prompts/tasks
|
||||
- Let the LLM work through the task
|
||||
|
||||
3. **Finish evaluation:**
|
||||
```bash
|
||||
git checkout main
|
||||
```
|
||||
All commits are automatically preserved in the `eval/<eval-name>` branch.
|
||||
|
||||
## Managing Evaluations
|
||||
|
||||
- **List all evaluations:** `git branch | grep "^ eval/"`
|
||||
- **View an evaluation:** `git checkout eval/<eval-name>`
|
||||
- **Compare evaluations:** `git diff eval/foo eval/bar`
|
||||
- **Delete an evaluation:** `git branch -D eval/<eval-name>`
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
eval/
|
||||
├── flake.nix
|
||||
├── flake.lock
|
||||
├── .envrc
|
||||
├── scripts/
|
||||
│ └── start-eval.sh
|
||||
└── README.md
|
||||
```
|
||||
|
||||
Each evaluation lives as a separate branch in the repository with its own git history.
|
||||
Reference in New Issue
Block a user