Files
nix/modules/nixos/services/llama-swap/config.nix
Evan Reichard 88308602c8 feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1
Allow one CUDA0 and one CUDA1 model to run simultaneously. Dual-GPU
models (using -ts splits) are excluded from the matrix so they evict
everything when loaded. vLLM docker models get evict_cost=50 to
discourage eviction due to slow cold starts.
2026-05-01 16:50:28 -04:00

25 KiB