Sync all three vLLM model configs from club-3090 master (ae4846f). Update to Genesis v7.65 full PROD env set with new patches. Update docker image to nightly-7a1eb8ac. Add torch_compile and triton cache dirs. Add agent setup guide (AGENTS.md). Add 'evan' API key to llama-swap sops secrets.
3.0 KiB
3.0 KiB
llama-swap Module — Agent Guide
Syncing vLLM Configs from club-3090
The three vLLM model configs in config.nix (vllm-qwen3.6-27b-long-text, vllm-qwen3.6-27b-long-vision, vllm-qwen3.6-27b-tools-text) are derived from the club-3090 repo's Docker Compose files. Each config block has a Synced from: comment with the commit hash it was last aligned to.
Source Files
The upstream compose files live at https://github.com/noonghunna/club-3090 under models/qwen3.6-27b/vllm/compose/:
| config.nix model ID | Compose file |
|---|---|
vllm-qwen3.6-27b-long-text |
docker-compose.long-text.yml |
vllm-qwen3.6-27b-long-vision |
docker-compose.long-vision.yml |
vllm-qwen3.6-27b-tools-text |
docker-compose.tools-text.yml |
Sync Process
- Fetch the latest compose files from https://github.com/noonghunna/club-3090 (master branch) and note the HEAD commit hash.
- Diff each compose file against the current config.nix block. The mapping is:
- Compose
command:args → NixvllmCmdstring (theexec vllm serve ...block) - Compose
environment:→ Nix docker-eflags - Compose
volumes:→ Nix docker-vflags - Compose
image:→ Nix docker image tag at the end of thedocker runcommand - Compose
entrypoint:→ NixvllmCmdpreamble (theset -e; pip install ...; python3 ...lines beforeexec vllm serve)
- Compose
- Apply changes to
config.nix. Key things to watch:--max-model-lenand--gpu-memory-utilization— these change across versions- Genesis env vars — the full set grows frequently; add new ones, remove deprecated ones
- Sidecar patches — old patches get absorbed into Genesis; drop them from entrypoint + volume mounts
- Docker image tag — update when the compose files move to a new nightly
- Keep
patch_timings_07351e088.py— this is our own patch, not from club-3090. Always retain it in the entrypoint and volume mounts. - Update the
Synced from:comment on each config block with the new commit hash and date. - Update
setup-qwen36-vllm.shif the upstreampatches/directory changed (new patches added, old ones removed). The setup script downloads sidecar patches and creates cache directories. - Verify syntax:
nix-instantiate --parse config.nix
Structural Notes
config.nixuses Nix string interpolation. Newlines invllmCmdare flattened to spaces viabuiltins.replaceStringsbefore passing todocker run -c.- We pin
CUDA_VISIBLE_DEVICES=0andCUDA_DEVICE_ORDER=PCI_BUS_ID(not in compose files) because the host has multiple GPUs and llama-swap's concurrency matrix manages GPU assignment. - Volume mounts use
/mnt/ssd/vLLM/paths (Models, Patches, Cache) — these match whatsetup-qwen36-vllm.shcreates. - The
patches/subdirectory in this module contains our custom timings patch and its source.patchfile — unrelated to club-3090'spatches/dir.