Commit Graph

33 Commits

Author SHA1 Message Date
7c1519881a refactor(llama-swap): generate sops secrets from apiKeys list 2026-05-02 15:48:15 -04:00
8d45977154 chore(llama-swap): update Genesis to v7.69, add cliff 2 optimizations
- Bump Genesis pin from fc89395 to 2db18df (v7.69)
- Add PN32 GDN chunked prefill and PN34 workspace lock relax env vars
- Replace patch_workspace_lock_disable with patch_inputs_embeds_optional
- Remove setup-time PN25/PN30 patches (folded into v7.69 natively)
- Switch patch base URL to v7.69-cliff2-test branch
- Lower GPU memory utilization to 0.93 for long-text variant
- Remove python3 from preflight check prerequisites
- Add printing service to lin-va-thinkpad
2026-05-02 13:48:41 -04:00
40114f438f feat(llama-swap): sync vLLM configs from club-3090, add evan API key
Sync all three vLLM model configs from club-3090 master (ae4846f).
Update to Genesis v7.65 full PROD env set with new patches.
Update docker image to nightly-7a1eb8ac. Add torch_compile and
triton cache dirs. Add agent setup guide (AGENTS.md).

Add 'evan' API key to llama-swap sops secrets.
2026-05-02 08:27:47 -04:00
e4d40d89d9 feat: add api keys to llama-swap 2026-05-01 22:12:51 -04:00
43a1d66e6b add 9b vision 2026-05-01 21:51:06 -04:00
1283b7cdef add fim 4b + 9b 2026-05-01 21:42:50 -04:00
09fdff4908 refactor(llama-swap): reorganize models by GPU hardware section 2026-05-01 21:08:53 -04:00
88308602c8 feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1
Allow one CUDA0 and one CUDA1 model to run simultaneously. Dual-GPU
models (using -ts splits) are excluded from the matrix so they evict
everything when loaded. vLLM docker models get evict_cost=50 to
discourage eviction due to slow cold starts.
2026-05-01 16:50:28 -04:00
1812d2ea03 feat(pi): add vision model support
Extract a shared hasType helper for model filtering and add
vision (text + image) input capability to compatible models.
Also tag two llama-swap models with the vision type.
2026-05-01 15:03:26 -04:00
ab63211a75 wip 2026-05-01 14:36:36 -04:00
561f10d2a7 fix: timing & vllm 2026-05-01 13:09:28 -04:00
a3b2efa5bb feat: vllm timings patch 2026-05-01 10:57:59 -04:00
74ff71803b feat: vllm yay 2026-05-01 10:38:43 -04:00
75eba8703f add: vllm base 3.6 27b 2026-04-30 21:47:29 -04:00
990b6a4392 feat: vllm 2026-04-30 20:04:58 -04:00
93e2247a30 chore(nixos/llama-swap): remove synthetic peer and tune local model args 2026-04-30 11:43:04 -04:00
976edab339 config(llama-swap): enable preserve_thinking in chat template kwargs 2026-04-30 07:45:57 -04:00
1070642635 feat(llama-swap): add qwen3.6-27b-thinking model 2026-04-22 13:01:38 -04:00
ceff33273d feat(llama-swap): add Qwen3.6-35B-A3B model configuration 2026-04-16 16:20:00 -04:00
90eed0378e chore(llama): update llama-cpp and remove deprecated models
- Upgrade llama-cpp from version 8196 to 8229
- Remove GPT OSS CSEC (20B) - Thinking model config
- Remove Qwen3 Next (80B) - Instruct model config
- Increment -ncmoe parameter for Qwen3 Coder Next (80B) model
2026-03-07 08:55:00 -05:00
1bce17c5f9 chore(llm): update llama-cpp, llama-swap and switch to qwen3.5-27b-thinking
- Bump llama-cpp from version 8157 to 8196
- Bump llama-swap from version 192 to 197
- Switch default assistant model from qwen3-coder-next-80b to qwen3.5-27b-thinking
- Remove glm-4-32b-instruct model configuration
- Update qwen3.5-27b-thinking config:
  - Use bartowski quantization (IQ4_XS) instead of unsloth
  - Increase context window from 131k to 196k
  - Add cache type settings (q8_0) and CUDA device
- Add 1password-cli to home-manager programs
- Fix typo: 'dispay' -> 'display' in llm-config.lua
2026-03-05 07:32:57 -05:00
bf7cc81a61 feat: add coding model filtering and CUDA acceleration
- claude: filter model completion to coding/synthetic models only
- llama-swap: update model to IQ4_XS and add CUDA device selection
2026-02-26 15:47:25 -05:00
7649de4472 feat(llama-swap): add Qwen3.5 models and update model configurations
- Add Qwen3.5-35B-A3B and Qwen3.5-27B thinking model configs
- Remove deprecated Qwen2.5-Coder-7B model
- Update synthetic models list with new HF endpoints
2026-02-24 19:52:42 -05:00
d685773604 refactor: update llm model configurations and add AI agent guidelines
- Update nvim to use qwen3-coder-next-80b-instruct model
- Add AGENTS.md with AI agent best practices for timeout and file writing
- Update pi config to include agent guidelines
- Refactor llama-swap: remove old models, update quantizations, add tensor splits,
  remove GGML_CUDA_ENABLE_UNIFIED_MEMORY flags, and simplify configuration
2026-02-06 21:28:31 -05:00
ec15ebb262 refactor(terminal): filter models by coding type
Change opencode and pi model filtering to use 'coding' type instead of
more generic 'text-generation' type. Update llama-swap model configs to
include 'coding' in metadata type list for relevant models (deepseek-coder,
qwen-coder, mistral, codellama, llama3-8b-instruct-q5).
2026-02-06 08:33:01 -05:00
682b7d8b4b feat(llama-swap): add Qwen3 Coder Next 80B model configuration
Add new model "Qwen3 Coder Next (80B) - Instruct" with 262144 context window
and optimized parameters for coding tasks. Uses CUDA unified memory support.
2026-02-03 20:53:47 -05:00
7080727dce chore: update llama-cpp to b7898 and opencode to v1.1.48
- Update llama-cpp from b7867 to b7898
- Update opencode from v1.1.12 to v1.1.48 with improved build process:
  - Replace custom bundle script with official script/build.ts
  - Add shell completion support
  - Add version check testing
  - Simplify node_modules handling
- Update llama-swap service config with new llama.cpp options
- Clarify opencode agent testing workflow in developer and reviewer configs
2026-02-03 20:33:14 -05:00
0dca9802e6 feat(llama-swap): increase context window and add GPU configuration
- Increase context window from 80k to 202,752 tokens
- Add repeat penalty parameter (1.0)
- Enable CUDA device for GPU acceleration
2026-01-29 21:18:34 -05:00
72ddbb288e chore: update dependencies and add OpenCode commit command
- Update llama.cpp from b7789 to b7867
- Update llama-swap from v182 to v186
- Add OpenCode conventional commit command configuration
- Add moonshotai Kimi-K2.5 model to llama-swap
2026-01-29 21:01:56 -05:00
abcbc3250a fix: printer 2026-01-23 18:32:10 -05:00
58fa210eb5 update llamacpp 2026-01-20 20:20:50 -05:00
408151b2ec chore: various improvements & refactor 2026-01-17 09:28:38 -05:00
c8f5e744d0 chore(cleanup): sops, opencode, etc 2026-01-11 22:19:31 -05:00