nix/modules/nixos/services/llama-swap/config.nix at 88308602c8171d96e2464281f2a3cad04cca81fd

evan/nix

Files

Evan Reichard 88308602c8 feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1

Allow one CUDA0 and one CUDA1 model to run simultaneously. Dual-GPU
models (using -ts splits) are excluded from the matrix so they evict
everything when loaded. vLLM docker models get evict_cost=50 to
discourage eviction due to slow cold starts.

2026-05-01 16:50:28 -04:00

25 KiB

Raw Blame History

View Raw

25 KiB Raw Blame History

25 KiB

Raw Blame History