Files
nix/modules/nixos/services/llama-swap
Evan Reichard 88308602c8 feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1
Allow one CUDA0 and one CUDA1 model to run simultaneously. Dual-GPU
models (using -ts splits) are excluded from the matrix so they evict
everything when loaded. vLLM docker models get evict_cost=50 to
discourage eviction due to slow cold starts.
2026-05-01 16:50:28 -04:00
..
wip
2026-05-01 14:36:36 -04:00
2026-04-30 20:04:58 -04:00
wip
2026-05-01 14:36:36 -04:00