- Restructure floating input: dominant textarea with compact bottom
toolbar (model badge, thinking toggle, attach, send/stop).
- Model badge sizes to the current selection (not widest option) via
a layered transparent select, with truncate-on-overflow fallback.
- Auto-expand the conversation sidebar on desktop and slide chat
content right when open instead of overlaying.
- Add per-request thinking toggle (brain icon, default on, persisted
in localStorage) sending chat_template_kwargs.enable_thinking.
- Always disable thinking for title summarization.
- Generate chat titles before the main response to keep the SSE
stream from staying open past visible completion and to avoid
busting the KV cache between turns.
- Remove `api_endpoint` from Settings model and settings UI
- Add `--llm-endpoint` / `AETHERA_LLM_ENDPOINT` and `--llm-key` /
`AETHERA_LLM_KEY` CLI flags (endpoint is required)
- Update client constructor to accept API key parameter
- Update tests and documentation to reflect new configuration approach
BREAKING CHANGE: LLM endpoint and key must now be provided via
`AETHERA_LLM_ENDPOINT` and `AETHERA_LLM_KEY` environment variables or
CLI flags instead of the Settings page.
Add vision/multimodal support to chat, allowing users to send images
alongside or instead of text prompts. Images are transmitted and persisted
as base64 data URLs.
Backend:
- Add Images []string to Message struct for persistence
- Add Images []string to GenerateTextRequest with relaxed validation
- Build multimodal user messages using OpenAI SDK content parts
- Pass images through from handlers to client
- Deep-copy Images slice in message cloning
Frontend:
- Add images?: string[] to Message and GenerateTextRequest types
- Add image selection state and file input handler
- Add camera icon button, hidden file input, and image preview strip
- Render images in user message bubbles
- Pass images through to GenerateTextRequest
Tests:
- Add TestSendMessageWithImage for vision model testing
Add cache-busting query parameter to the stream fetch URL so each
tab gets a unique request and the browser cannot reuse an in-flight
response. Remove redundant Transfer-Encoding header that Go sets
automatically.