feat(llama-swap): add --default-chat-template-kwargs to vLLM 3090 configs
Sync all three Qwen3.6 27B vLLM configs (tools-text, long-text, long-vision) with club-3090 83bf73d. Adds disable-thinking flag and introduces upstream hash tracking comments for future syncs. Update update-vllm-3090-configs skill to use hash-based skip logic.
This commit is contained in:
@@ -19,6 +19,15 @@ Local config keys:
|
||||
- `vllm-qwen3.6-27b-long-text`
|
||||
- `vllm-qwen3.6-27b-long-vision`
|
||||
|
||||
## Hash Tracking
|
||||
|
||||
Each config entry stores an upstream commit hash comment:
|
||||
`# Upstream: club-3090 <hash> (<date>) - <compose-file>`
|
||||
|
||||
When comparing, first extract stored hashes. If a config's hash matches
|
||||
upstream HEAD, skip it (report "already synced"). Only full-diff configs
|
||||
whose hash differs. Update the hash comment when edits are applied.
|
||||
|
||||
## Upstream References
|
||||
|
||||
Compare against `club-3090` master:
|
||||
@@ -37,25 +46,26 @@ git clone https://github.com/noonghunna/club-3090 _scratch/club-3090 2>/dev/null
|
||||
## Required Workflow
|
||||
|
||||
1. Fetch/update upstream refs under `_scratch/club-3090` or fetch the raw files.
|
||||
2. Compare upstream compose files to the three local llama-swap entries. Translate docker-compose semantics into the existing `docker run`/llama-swap format.
|
||||
3. Compare upstream `scripts/setup.sh` Genesis pin to local `GENESIS_PIN` in `setup-qwen36-vllm.sh`.
|
||||
4. Check upstream compose volumes/entrypoint for sidecar patches. If patches are added, removed, renamed, or invoked differently, update both:
|
||||
2. Extract stored upstream hashes from `# Upstream: club-3090 ...` comments in config.nix. Skip any config whose hash matches upstream HEAD (report "already synced").
|
||||
3. Compare upstream compose files to the remaining local llama-swap entries. Translate docker-compose semantics into the existing `docker run`/llama-swap format.
|
||||
4. Compare upstream `scripts/setup.sh` Genesis pin to local `GENESIS_PIN` in `setup-qwen36-vllm.sh`.
|
||||
5. Check upstream compose volumes/entrypoint for sidecar patches. If patches are added, removed, renamed, or invoked differently, update both:
|
||||
- runtime mounts and `python3 /patches/...` calls in `config.nix`
|
||||
- download/install logic and summary in `setup-qwen36-vllm.sh`
|
||||
5. Ignore these diffs unless the user explicitly asks otherwise:
|
||||
6. Ignore these diffs unless the user explicitly asks otherwise:
|
||||
- `shm_size` / shm-related compose settings
|
||||
- local timing patch `patch_timings_07351e088.py` and its mount/invocation
|
||||
- model served-name differences caused by llama-swap `${MODEL_ID}`
|
||||
- `HUGGING_FACE_HUB_TOKEN`; keep local CUDA device/env choices
|
||||
- upstream relative paths vs local `/mnt/ssd/vLLM/...` paths
|
||||
- docker-compose format vs local llama-swap/Nix format
|
||||
6. Before editing, present:
|
||||
7. Before editing, present:
|
||||
- upstream files/commit checked
|
||||
- meaningful diffs found
|
||||
- ignored diffs
|
||||
- exact planned local changes
|
||||
Then wait for explicit user approval.
|
||||
7. After approval, edit minimally and validate:
|
||||
8. After approval, edit minimally and update the `# Upstream: club-3090 ...` hash comments. Validate:
|
||||
- `bash -n modules/nixos/services/llama-swap/setup-qwen36-vllm.sh`
|
||||
- `nix-instantiate --parse modules/nixos/services/llama-swap/config.nix`
|
||||
8. Summarize changed files and any remaining upstream differences.
|
||||
9. Summarize changed files and any remaining upstream differences.
|
||||
|
||||
Reference in New Issue
Block a user