evan/nix - nix - Gitea: Git with a cup of tea

evan/nix

Author	SHA1	Message	Date
Evan Reichard	7c1519881a	refactor(llama-swap): generate sops secrets from apiKeys list	2026-05-02 15:48:15 -04:00
Evan Reichard	8d45977154	chore(llama-swap): update Genesis to v7.69, add cliff 2 optimizations - Bump Genesis pin from fc89395 to 2db18df (v7.69) - Add PN32 GDN chunked prefill and PN34 workspace lock relax env vars - Replace patch_workspace_lock_disable with patch_inputs_embeds_optional - Remove setup-time PN25/PN30 patches (folded into v7.69 natively) - Switch patch base URL to v7.69-cliff2-test branch - Lower GPU memory utilization to 0.93 for long-text variant - Remove python3 from preflight check prerequisites - Add printing service to lin-va-thinkpad	2026-05-02 13:48:41 -04:00
Evan Reichard	40114f438f	feat(llama-swap): sync vLLM configs from club-3090, add evan API key Sync all three vLLM model configs from club-3090 master (ae4846f). Update to Genesis v7.65 full PROD env set with new patches. Update docker image to nightly-7a1eb8ac. Add torch_compile and triton cache dirs. Add agent setup guide (AGENTS.md). Add 'evan' API key to llama-swap sops secrets.	2026-05-02 08:27:47 -04:00
Evan Reichard	e4d40d89d9	feat: add api keys to llama-swap	2026-05-01 22:12:51 -04:00
Evan Reichard	43a1d66e6b	add 9b vision	2026-05-01 21:51:06 -04:00
Evan Reichard	1283b7cdef	add fim 4b + 9b	2026-05-01 21:42:50 -04:00
Evan Reichard	09fdff4908	refactor(llama-swap): reorganize models by GPU hardware section	2026-05-01 21:08:53 -04:00
Evan Reichard	88308602c8	feat(llama-swap): add concurrent model matrix for CUDA0/CUDA1 Allow one CUDA0 and one CUDA1 model to run simultaneously. Dual-GPU models (using -ts splits) are excluded from the matrix so they evict everything when loaded. vLLM docker models get evict_cost=50 to discourage eviction due to slow cold starts.	2026-05-01 16:50:28 -04:00
Evan Reichard	1812d2ea03	feat(pi): add vision model support Extract a shared hasType helper for model filtering and add vision (text + image) input capability to compatible models. Also tag two llama-swap models with the vision type.	2026-05-01 15:03:26 -04:00
Evan Reichard	ab63211a75	wip	2026-05-01 14:36:36 -04:00
Evan Reichard	561f10d2a7	fix: timing & vllm	2026-05-01 13:09:28 -04:00
Evan Reichard	a3b2efa5bb	feat: vllm timings patch	2026-05-01 10:57:59 -04:00
Evan Reichard	74ff71803b	feat: vllm yay	2026-05-01 10:38:43 -04:00
Evan Reichard	75eba8703f	add: vllm base 3.6 27b	2026-04-30 21:47:29 -04:00
Evan Reichard	990b6a4392	feat: vllm	2026-04-30 20:04:58 -04:00
Evan Reichard	93e2247a30	chore(nixos/llama-swap): remove synthetic peer and tune local model args	2026-04-30 11:43:04 -04:00
Evan Reichard	976edab339	config(llama-swap): enable preserve_thinking in chat template kwargs	2026-04-30 07:45:57 -04:00
Evan Reichard	1070642635	feat(llama-swap): add qwen3.6-27b-thinking model	2026-04-22 13:01:38 -04:00
Evan Reichard	ceff33273d	feat(llama-swap): add Qwen3.6-35B-A3B model configuration	2026-04-16 16:20:00 -04:00
Evan Reichard	90eed0378e	chore(llama): update llama-cpp and remove deprecated models - Upgrade llama-cpp from version 8196 to 8229 - Remove GPT OSS CSEC (20B) - Thinking model config - Remove Qwen3 Next (80B) - Instruct model config - Increment -ncmoe parameter for Qwen3 Coder Next (80B) model	2026-03-07 08:55:00 -05:00
Evan Reichard	1bce17c5f9	chore(llm): update llama-cpp, llama-swap and switch to qwen3.5-27b-thinking - Bump llama-cpp from version 8157 to 8196 - Bump llama-swap from version 192 to 197 - Switch default assistant model from qwen3-coder-next-80b to qwen3.5-27b-thinking - Remove glm-4-32b-instruct model configuration - Update qwen3.5-27b-thinking config: - Use bartowski quantization (IQ4_XS) instead of unsloth - Increase context window from 131k to 196k - Add cache type settings (q8_0) and CUDA device - Add 1password-cli to home-manager programs - Fix typo: 'dispay' -> 'display' in llm-config.lua	2026-03-05 07:32:57 -05:00
Evan Reichard	bf7cc81a61	feat: add coding model filtering and CUDA acceleration - claude: filter model completion to coding/synthetic models only - llama-swap: update model to IQ4_XS and add CUDA device selection	2026-02-26 15:47:25 -05:00
Evan Reichard	7649de4472	feat(llama-swap): add Qwen3.5 models and update model configurations - Add Qwen3.5-35B-A3B and Qwen3.5-27B thinking model configs - Remove deprecated Qwen2.5-Coder-7B model - Update synthetic models list with new HF endpoints	2026-02-24 19:52:42 -05:00
Evan Reichard	d685773604	refactor: update llm model configurations and add AI agent guidelines - Update nvim to use qwen3-coder-next-80b-instruct model - Add AGENTS.md with AI agent best practices for timeout and file writing - Update pi config to include agent guidelines - Refactor llama-swap: remove old models, update quantizations, add tensor splits, remove GGML_CUDA_ENABLE_UNIFIED_MEMORY flags, and simplify configuration	2026-02-06 21:28:31 -05:00
Evan Reichard	ec15ebb262	refactor(terminal): filter models by coding type Change opencode and pi model filtering to use 'coding' type instead of more generic 'text-generation' type. Update llama-swap model configs to include 'coding' in metadata type list for relevant models (deepseek-coder, qwen-coder, mistral, codellama, llama3-8b-instruct-q5).	2026-02-06 08:33:01 -05:00
Evan Reichard	682b7d8b4b	feat(llama-swap): add Qwen3 Coder Next 80B model configuration Add new model "Qwen3 Coder Next (80B) - Instruct" with 262144 context window and optimized parameters for coding tasks. Uses CUDA unified memory support.	2026-02-03 20:53:47 -05:00
Evan Reichard	7080727dce	chore: update llama-cpp to b7898 and opencode to v1.1.48 - Update llama-cpp from b7867 to b7898 - Update opencode from v1.1.12 to v1.1.48 with improved build process: - Replace custom bundle script with official script/build.ts - Add shell completion support - Add version check testing - Simplify node_modules handling - Update llama-swap service config with new llama.cpp options - Clarify opencode agent testing workflow in developer and reviewer configs	2026-02-03 20:33:14 -05:00
Evan Reichard	0dca9802e6	feat(llama-swap): increase context window and add GPU configuration - Increase context window from 80k to 202,752 tokens - Add repeat penalty parameter (1.0) - Enable CUDA device for GPU acceleration	2026-01-29 21:18:34 -05:00
Evan Reichard	72ddbb288e	chore: update dependencies and add OpenCode commit command - Update llama.cpp from b7789 to b7867 - Update llama-swap from v182 to v186 - Add OpenCode conventional commit command configuration - Add moonshotai Kimi-K2.5 model to llama-swap	2026-01-29 21:01:56 -05:00
Evan Reichard	abcbc3250a	fix: printer	2026-01-23 18:32:10 -05:00
Evan Reichard	58fa210eb5	update llamacpp	2026-01-20 20:20:50 -05:00
Evan Reichard	408151b2ec	chore: various improvements & refactor	2026-01-17 09:28:38 -05:00
Evan Reichard	c8f5e744d0	chore(cleanup): sops, opencode, etc	2026-01-11 22:19:31 -05:00

33 Commits