nix/modules/nixos/services/llama-swap at 88308602c8171d96e2464281f2a3cad04cca81fd - nix

evan/nix

Files

Evan Reichard 88308602c8 feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1

Allow one CUDA0 and one CUDA1 model to run simultaneously. Dual-GPU
models (using -ts splits) are excluded from the matrix so they evict
everything when loaded. vLLM docker models get evict_cost=50 to
discourage eviction due to slow cold starts.

2026-05-01 16:50:28 -04:00

patches

wip

2026-05-01 14:36:36 -04:00

config.nix

feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1

2026-05-01 16:50:28 -04:00

default.nix

feat: vllm

2026-04-30 20:04:58 -04:00

setup-qwen36-vllm.sh

wip

2026-05-01 14:36:36 -04:00