Commit Graph

9 Commits

Author SHA1 Message Date
328bb6e1db feat(llama-swap): add ik-llama-cpp package and Qwen3.6-27B MTP config
Add ikawrakow/ik_llama.cpp as a new package with CUDA/Vulkan support,
enabling MTP (Multi-Token Prediction) and IQ4_KS quantization. Wire it
into llama-swap with a new 'ik-qwen3.6-27b-iq4ks-thinking' model config
and 'iq36' alias. Also add a chat template download to the vLLM setup
script and include the binary on lin-va-desktop.
2026-05-12 16:19:34 -04:00
ecad94aab3 fix(llama-swap): update vllm timings patch 2026-05-11 09:40:13 -04:00
37b0fae7e2 fix(llama-swap): sync qwen vllm 3090 configs 2026-05-09 10:16:32 -04:00
d3ccbda958 refactor(llama-swap): update Genesis to v7.69 and upgrade vLLM nightly image
- Bump Genesis pin from 2db18df to 7b9fd319
- Upgrade vllm/vllm-openai nightly from 7a1eb8ac to 01d4d1ad
- Remove standalone boot-time patches now folded into Genesis (patch_tolist_cudagraph,
  patch_inputs_embeds_optional, patch_workspace_lock_disable, patch_pn25_genesis_register_fix,
  patch_pn30_dst_shaped_temp_fix, patch_pr40798_workspace)
- Reorganize environment variables across all vLLM compose configs
- Add new Genesis optimizations: P100, P101, P103, P15B, P38B, PN59 streaming GDN
2026-05-05 15:50:37 -04:00
8d45977154 chore(llama-swap): update Genesis to v7.69, add cliff 2 optimizations
- Bump Genesis pin from fc89395 to 2db18df (v7.69)
- Add PN32 GDN chunked prefill and PN34 workspace lock relax env vars
- Replace patch_workspace_lock_disable with patch_inputs_embeds_optional
- Remove setup-time PN25/PN30 patches (folded into v7.69 natively)
- Switch patch base URL to v7.69-cliff2-test branch
- Lower GPU memory utilization to 0.93 for long-text variant
- Remove python3 from preflight check prerequisites
- Add printing service to lin-va-thinkpad
2026-05-02 13:48:41 -04:00
40114f438f feat(llama-swap): sync vLLM configs from club-3090, add evan API key
Sync all three vLLM model configs from club-3090 master (ae4846f).
Update to Genesis v7.65 full PROD env set with new patches.
Update docker image to nightly-7a1eb8ac. Add torch_compile and
triton cache dirs. Add agent setup guide (AGENTS.md).

Add 'evan' API key to llama-swap sops secrets.
2026-05-02 08:27:47 -04:00
ab63211a75 wip 2026-05-01 14:36:36 -04:00
561f10d2a7 fix: timing & vllm 2026-05-01 13:09:28 -04:00
74ff71803b feat: vllm yay 2026-05-01 10:38:43 -04:00