fix: increase trash icon visibility (remove 0.5 opacity, bump to 15px)

feat: add trash-can icon to left of each conversation in sidebar
Replace the hover-reveal × on the right with an always-visible 🗑 icon positioned to the left of the conversation title. Clicking it triggers the existing deleteConversation() which shows a confirm dialog and enforces admin-only access.
2026-06-27 16:09:05 -07:00 · 2026-06-27 16:07:18 -07:00 · 2026-06-27 16:03:19 -07:00 · 2026-06-27 15:59:43 -07:00 · 2026-06-27 15:27:47 -07:00 · 2026-06-27 15:27:13 -07:00
30 changed files with 1564 additions and 912 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,114 @@
+# JarvisChat — Agents Guide
+
+## Run
+
+```bash
+./venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080 --reload
+```
+
+## Tests
+
+```bash
+./venv/bin/python -m pytest tests/ -v
+```
+
+All tests use `tmp_path` fixtures + monkeypatched `httpx.AsyncClient.stream`. No external services needed. Test factories reset `SESSIONS`, `PIN_ATTEMPTS`, `RATE_EVENTS` globals — be careful not to let test state leak. After the modular refactor, tests import directly from the correct modules (`db`, `security`, `config`, `search`, `rag`, `memory`, `routers.*`) — not from the old monolithic `app` namespace.
+
+Every router has a dedicated test file:
+| File | Covers |
+|------|--------|
+| `test_auth_capabilities.py` | `auth.py` — guest/admin sessions, origin blocking, logout |
+| `test_chat_streaming_and_memory_paths.py` | `routers/chat.py` — streaming, auto-search, remember/forget |
+| `test_completions.py` | `routers/completions.py` — API key auth, FIM, streaming, blocking, errors |
+| `test_conversations.py` | `routers/conversations.py` — full CRUD, guest admin enforcement |
+| `test_memories.py` | `routers/memories.py` — edit, search, stats endpoints |
+| `test_models_router.py` | `routers/models.py` — models list, ps, show, stats, search/status |
+| `test_presets.py` | `routers/presets.py` — full CRUD, default preset protection |
+| `test_profile.py` | `routers/profile.py` — get, update, default, length validation |
+| `test_search_route.py` | `routers/search_route.py` — explicit search flow, no results, errors |
+| `test_search_url_sanitization.py` | `search.py` URL sanitizer |
+| `test_settings_allowlist.py` | `routers/settings.py` — allowlisted key enforcement |
+| `test_skills_framework.py` | `routers/skills.py` — list, toggle, unknown skill, prompt injection |
+| `test_ip_allowlist.py` | IP allowlist helper + middleware |
+| `test_rate_and_payload_guardrails.py` | Rate limits + payload size enforcement |
+| `test_error_envelopes.py` | Global exception handler + stream error incidents |
+
+Modules that call `httpx.AsyncClient` (chat, completions, models, search_route)
+are mocked via `monkeypatch.setattr` on `AsyncClient.stream`, `.get`, or `.post`.
+CPU stats in `models.py` (`api/stats`) use real `psutil`; GPU stats are
+monkeypatched via `routers.models.get_gpu_stats`.
+
+## Architecture
+
+Refactored from single-file (`app.py`) into modules under project root:
+
+| File | Role |
+|------|------|
+| `app.py` | FastAPI app, middleware, router registration |
+| `config.py` | Constants, env vars, rate/payload limits, built-in skills registry |
+| `db.py` | SQLite schema, connection factory, settings helpers |
+| `auth.py` | PIN-based guest/admin sessions, auth routes |
+| `security.py` | Rate limiting, origin checks, IP allowlist, audit/incident logging |
+| `memory.py` | FTS5 memory CRUD, remember/forget command parsing |
+| `search.py` | SearXNG integration, perplexity scoring, refusal detection |
+| `rag.py` | Qdrant vector search + system prompt assembly |
+| `gpu.py` | AMD GPU stats via `rocm-smi` |
+| `routers/` | One module per endpoint group (chat, search, skills, completions, etc.) |
+
+### Entrypoint / API keys
+
+- `app.py` line 148: `uvicorn.run(app, ...)` when called directly
+- `config.py` line 14: `LLAMA_SERVER_BASE` defaults to `http://192.168.50.108:8081` — llama-server, **not** standard Ollama port, used by all model endpoints
+- `config.py` line 17: `COMPLETIONS_API_KEY` read from `JARVISCHAT_COMPLETIONS_API_KEY` env var or auto-generates a random key — no longer a missing import
+- `config.py` line 13: `OLLAMA_BASE` is legacy/unused — all endpoints now use `LLAMA_SERVER_BASE`
+
+### Key flows
+
+1. **`/api/chat`** → `process_remember_command()` intercepts "remember that..." / "forget about..." first → else `build_system_prompt()` (profile + FTS5 memory + Qdrant RAG + preset + skills) → stream from llama-server with `logprobs: true` → if perplexity > 15.0 OR `REFUSAL_PATTERNS` match, re-query with SearXNG results
+2. **`/api/search`** → bypasses perplexity/refusal, queries SearXNG directly → summarizes via llama-server (no raw results leaked in SSE)
+3. **`/v1/chat/completions`** → OpenAI-compatible for Continue.dev/IDE integration; FIM requests proxied without persistence
+
+### Perplexity / auto-search
+
+The upstream request includes `"logprobs": true`. `parse_llama_stream_chunk()` extracts per-token logprobs from each chunk's `choices[0].logprobs.content[].logprob`. The `all_logprobs` list is populated during streaming, so `calculate_perplexity()` and `is_uncertain()` work correctly — auto-search on high perplexity is no longer dead code.
+
+### Auth / lockdown
+
+- Guest session by default (`POST /api/auth/guest`), admin unlock via 4-digit PIN (`POST /api/auth/login`)
+- Admin required for PUT/DELETE/PATCH + all POST except allowlist (`/api/chat`, `/api/search`, `/api/auth/*`)
+- IP allowlist, rate limiting, origin checking, payload size limits — all enforced in `app.py` middleware
+- Origin check applies to **all** `/api/` requests (not just state-changing methods); `origin_allowed()` returns `False` when both `Origin` and `Referer` headers are absent, closing CSRF read gap
+- `JARVISCHAT_ADMIN_PIN` env var required on first boot (or `JARVISCHAT_ALLOW_DEFAULT_PIN=true`)
+
+### Database
+
+- SQLite at `jarvischat.db`, auto-created by `init_db()` on startup via FastAPI `lifespan`
+- `get_db()` opens new connection per request (no pool). Close after use.
+- FTS5 virtual table `memories` for full-text search with BM25 ranking. FTS5 operator keywords (`AND`, `OR`, `NOT`, `NEAR`) are double-quoted to prevent parse errors.
+
+### External services
+
+| Service | Required | Port |
+|---------|----------|------|
+| llama-server (OpenAI-compat API) | Yes | 8081 (ultron) or env `LLAMA_SERVER_BASE` |
+| SearXNG | No | 8888 |
+| wttr.in | No | weather shortcut bypasses SearXNG; curl UA for plain-text output |
+| rocm-smi | No | AMD GPU stats |
+| Qdrant | No | 6333 (ultron) — RAG vector search |
+
+### Config quirks
+
+- Rate limits and payload caps in `config.py` — tweak for testing by monkeypatching module attributes (note: patch `security.RL_*` not `config.RL_*` since `security` imports bindings separately)
+- `ALLOWED_SETTINGS_KEYS` in `config.py` controls which keys the UI can write via `/api/settings`
+- Settings table seeded with defaults (`profile_enabled`, `search_enabled`, `memory_enabled`, `skills_enabled`, `default_model`) — never overwritten by `init_db()`
+- Profile table uses singleton row `id=1`
+- RAG embedding requests go to `EMBED_URL` at `/api/embeddings` (separate Ollama instance)
+
+### SSE Protocol
+
+All streaming endpoints yield `data: {json}\n\n`. Key shapes:
+- `{token, conversation_id}` — streaming token
+- `{searching: true}` — web search triggered
+- `{search_results: N}` — N results (no raw_results payload)
+- `{done: true, perplexity, tokens_per_sec, searched?}` — terminal
+- `{error: "...", error_key: "..."}` — error with incident key
--- a/README.md
+++ b/README.md
@@ -1,453 +1,263 @@
-![jarvisChat logo](static/jcscreenie.png)
-# ⚡ JarvisChat v1.9.0
+# JarvisChat v1.8.5

-**A privacy-first, homelab-native developer knowledge platform.**
+**A lightweight local inference coding companion with persistent memory, web search, and real-time system monitoring.**

-> JarvisChat turns a heterogeneous LAN of budget hardware into a distributed local AI inference cluster — accumulating institutional knowledge over time, keeping all data off the cloud, and squeezing real performance out of modest consumer hardware through architecture rather than dollars.
+Built with FastAPI + SQLite + Jinja2. Runs on Python 3.13. No Docker required.

-This is not another AI chat wrapper. jC is the UX and knowledge-management layer for a local AI brain — analogous to what Windows was to DOS, or what the web is to the internet. The intelligence lives in the model and the RAG corpus. jC makes it accessible and keeps feeding it.
+Developer wiki: [docs/wiki/Home.md](docs/wiki/Home.md)

---
+## What's New in v1.8.0

-## The Four Pillars
+- **Modular refactor completed** — single-file `app.py` split into `config.py`, `db.py`, `auth.py`, `security.py`, `memory.py`, `search.py`, `rag.py`, `gpu.py`, and `routers/` package
+- **`COMPLETIONS_API_KEY`** — auto-generated secret key for the OpenAI-compatible endpoint, overridable via `JARVISCHAT_COMPLETIONS_API_KEY` env var
+- **Perplexity auto-search fixed** — upstream request now sends `"logprobs": true`, `parse_llama_stream_chunk()` extracts per-token logprobs, so `calculate_perplexity()` and `is_uncertain()` work correctly (was dead code)
+- **All `/api/models` endpoints** — now correctly target `LLAMA_SERVER_BASE` (llama-server on port 8081) instead of the old Ollama port; `/api/ps` uses `/v1/models` endpoint
+- **RAG embedding endpoint fixed** — `EMBED_URL` changed from old server `:8081` to correct host/port `http://192.168.50.210:11434` (Ollama on new machine)
+- **Error messages corrected** — all user-facing errors say "inference server" instead of "Ollama" or "llama-server"
+- **Secure SSE protocol** — raw search results are no longer leaked in the SSE event stream
+- **FTS5 query safety** — operator keywords (`AND`, `OR`, `NOT`, `NEAR`) are double-quoted to prevent parse errors
+- **All 8 test files fixed** — rewired imports after the modular refactor; all 26 tests pass
+- **Origin check extended to all API methods** — GET/HEAD/OPTIONS requests no longer bypass origin checking (was limited to POST/PUT/DELETE/PATCH)
+- **Missing headers now rejected** — `origin_allowed()` returns `False` when both `Origin` and `Referer` are absent, closing the CSRF read gap for script-initiated requests
+- **Full router test coverage** — 7 new test files added: `test_conversations.py`, `test_presets.py`, `test_profile.py`, `test_models_router.py`, `test_completions.py`, `test_search_route.py`, `test_memories.py`; all 10 routers now have dedicated unit tests (92 total, up from 26)

-### 1. Privacy
-Everything runs on your LAN. No API keys, no cloud endpoints, no data leaving your network, no subscription, no terms-of-service surprises. Your conversations, your codebase, your decisions — stay yours.
+## Features

-### 2. Knowledge Retention
-Unlike stateless chat tools that forget everything when you close the tab, jC accumulates institutional memory. Every solved problem, every architectural decision, every working command gets absorbed into the RAG corpus via Qdrant. The system gets smarter the longer you use it.
+- **Persistent Memory** — SQLite FTS5 full-text search for fast, relevant memory retrieval
+- **Web Search** — SearXNG integration for automatic web lookups when the model is uncertain
+- **Explicit Search** — Search button to force web search without waiting for model uncertainty
+- **Profile Injection** — Custom system prompt injected into every conversation
+- **System Presets** — Save and switch between different system prompts
+- **Real-time Stats** — CPU, RAM, GPU, VRAM monitoring in sidebar
+- **Token Thermometer** — Visual context window usage indicator
+- **Streaming Responses** — Server-sent events for real-time token display
+- **Conversation History** — SQLite-backed chat persistence with mass-delete option
+- **Model Switching** — Change inference models on the fly
+- **Skills Framework** — Built-in skill registry with per-skill enable/disable controls

-### 3. Budget Hardware Maximization
-You don't need a $10,000 workstation. jC is designed for the developer who has a drawer full of machines and the skills to wire them together. RPC clustering, model splitting across CPU and GPU nodes, dynamic resource negotiation, and smart RAG eviction squeeze real performance out of modest consumer hardware.
-
-### 4. Homelab-Native Architecture
-Built specifically for the heterogeneous homelab: mixed hardware, mixed OS, consumer GPUs, ARM boards, NAS storage — all working together as a coherent AI platform. A designated master node hosts jC, llama-server, and SearXNG. GPU nodes self-register as RPC inference workers. The architecture scales horizontally across whatever you've got.
-
---
-
-## Target Audience
-
-Solo developers and homelab enthusiasts who are:
- Budget-constrained but hardware-rich (multiple machines, NAS, spare GPUs)
- Privacy-conscious (no cloud AI subscriptions)
- Technically capable (if you can install jC, you can designate the master node)
- Building something over time and want their AI to remember it
-
---
-
-## Architecture
+## File Structure

 ```
-┌─────────────────────────────────────────────────────────────┐
-│                        YOUR LAN                             │
-│                                                             │
-│  ┌─────────────────┐         ┌──────────────────────────┐  │
-│  │   jarvis        │◄──RPC───│   ultron                 │  │
-│  │   192.168.50.212│  50052  │   192.168.50.108         │  │
-│  │                 │         │                          │  │
-│  │  jC :8080       │         │  llama-server :8081      │  │
-│  │  SearXNG :8888  │         │  llama-server :8082 (*)  │  │
-│  │  RX 6600 XT 8GB │         │  Qdrant :6333            │  │
-│  │  GPU RPC worker │         │  mxbai-embed :11434      │  │
-│  │  Vulkan backend │         │  AMD Ryzen 7 7840HS      │  │
-│  └─────────────────┘         │  Radeon 780M iGPU        │  │
-│                              └──────────────────────────┘  │
-│                                                             │
-│  ┌─────────────────┐         ┌──────────────────────────┐  │
-│  │   pivault       │         │   corsair                │  │
-│  │   192.168.50.158│         │   192.168.50.132         │  │
-│  │                 │         │                          │  │
-│  │  10.83TB RAID5  │         │  RTX 5070 Ti 16GB        │  │
-│  │  RPi 5 8GB      │         │  Ryzen 7 7800X3D         │  │
-│  │  NAS / Kopia    │         │  Gaming / Streaming      │  │
-│  └─────────────────┘         └──────────────────────────┘  │
-│                                                             │
-│  (*) Planned: Qwen2.5-Coder-14B on :8082                   │
-└─────────────────────────────────────────────────────────────┘
+/opt/jarvischat/
+├── app.py              # FastAPI app entry point
+├── config.py           # Constants, env vars, limits, skill registry
+├── db.py               # SQLite schema, connection factory
+├── auth.py             # PIN-based guest/admin sessions, auth routes
+├── security.py         # Rate limiting, origin checks, IP allowlist, audit
+├── memory.py           # FTS5 memory CRUD, remember/forget commands
+├── search.py           # SearXNG integration, perplexity, refusal detection
+├── rag.py              # Qdrant vector search + system prompt assembly
+├── gpu.py              # AMD GPU stats via rocm-smi
+├── routers/
+│   ├── chat.py         # /api/chat streaming endpoint
+│   ├── search_route.py # /api/search explicit search endpoint
+│   ├── completions.py  # /v1/chat/completions OpenAI-compat endpoint
+│   ├── conversations.py# Conversation CRUD
+│   ├── memories.py     # Memory CRUD API
+│   ├── models.py       # Model listing, system stats
+│   ├── presets.py      # System prompt presets
+│   ├── profile.py      # User profile
+│   ├── settings.py     # Runtime settings
+│   └── skills.py       # Skills management
+├── static/
+│   └── logo.png        # Logo image (optional)
+├── templates/
+│   └── index.html      # Frontend
+└── tests/              # 26 pytest tests
 ```

-**Data flow:**
-```
-Browser / IDE (Continue.dev)
-    → jC :8080 (FastAPI — auth, RAG, memory, conversation history)
-        → Qdrant :6333 (vector search, mxbai-embed-large for embeddings)
-        → llama-server :8081 (inference)
-            → jarvis RPC :50052 (GPU layer offload — RX 6600 XT)
-```
+## Requirements

---
-
-## The AMD + NVIDIA Cross-Cluster Reality
-
-This cluster intentionally mixes GPU architectures — **AMD RX 6600 XT on jarvis** and **NVIDIA RTX 5070 Ti on corsair**. This is deliberate and it works.
-
-The RPC layer in llama.cpp is GPU-vendor-agnostic. jarvis runs llama-rpc with a **Vulkan backend** (not ROCm, not CUDA) which provides hardware-neutral GPU acceleration. ultron's llama-server connects to it over TCP and offloads tensor layers without caring what GPU is on the other end.
-
-This means any machine on your LAN with any GPU (AMD, NVIDIA, Intel Arc) can participate as an RPC worker — as long as it can run llama-rpc with Vulkan support.
-
---
-
-## Cluster Performance Tuning
-
-### The Layer Offloading Trick
-
-The key to squeezing performance out of a CPU+GPU split cluster is `--n-gpu-layers`. This controls how many transformer layers get offloaded to the RPC GPU backend versus staying on the CPU.
-
-**Starting point (before tuning):** ~7 t/s  
-**After initial layer optimization:** ~17 t/s  
-**After full cluster tuning:** 30–35 t/s
-
-The progression that got us there:
-
-1. **Start with `--n-gpu-layers 99`** — tells llama-server to offload as many layers as possible. With Mistral-Nemo-12B Q4_K_M this results in all 41/41 layers offloading to jarvis GPU via RPC.
-
-2. **Verify GPU is actually working** — watch the llama-server startup log for:
-   ```
-   load_tensors: offloaded 41/41 layers to GPU
-   load_tensors: RPC[192.168.50.210:50052] model buffer size = 6763.30 MiB
-   load_tensors: CPU_Mapped model buffer size = 360.00 MiB
-   ```
-   If layers aren't offloading, the RPC connection isn't established.
-
-3. **Check actual throughput** — the timings block in llama-server responses shows real t/s. Tune from there.
-
-**Current llama-server service on ultron (`/etc/systemd/system/llama-server.service`):**
-```ini
-[Unit]
-Description=Llama.cpp Server (RPC frontend — Mistral-Nemo general)
-After=network.target
-
-[Service]
-Type=simple
-User=root
-ExecStart=/root/llama.cpp/build/bin/llama-server \
-  --model /home/gramps/models/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf \
-  --rpc 192.168.50.212:50052 \
-  --host 0.0.0.0 \
-  --port 8081 \
-  --n-gpu-layers 99
-Restart=on-failure
-RestartSec=5
-
-[Install]
-WantedBy=multi-user.target
-```
-
-**llama-rpc service on jarvis (`/etc/systemd/system/llama-rpc.service`):**
-```ini
-[Unit]
-Description=Llama.cpp RPC Server (GPU backend — RX 6600 XT Vulkan)
-After=network.target
-
-[Service]
-Type=simple
-User=root
-ExecStart=/root/llama.cpp/build/bin/llama-rpc-server \
-  --host 0.0.0.0 \
-  --port 50052
-Restart=on-failure
-RestartSec=5
-
-[Install]
-WantedBy=multi-user.target
-```
-
---
-
-## Models
-
-### Current
-| Model | Location | Port | Purpose |
-|-------|----------|------|---------|
-| Mistral-Nemo-Instruct-2407-Q4_K_M | `/home/gramps/models/` on jarvis | ultron:8081 | General assistant, chat |
-| mxbai-embed-large | ultron (Docker/Ollama) | ultron:11434 | RAG embeddings |
-
-### Planned
-| Model | Size | Port | Purpose |
-|-------|------|------|---------|
-| Qwen2.5-Coder-14B-Q5_K_M | ~10GB | ultron:8082 | Code completion, pair programming |
-
-> **Note:** ultron has 16GB RAM. Only one primary inference model can be hot at a time. llama-server instances are swapped via systemd when switching between general and code models.
-
---
-
-## RAG System
-
-jC uses **Qdrant** for vector storage and **mxbai-embed-large** (1024-dim) for embeddings.
-
-### Qdrant Collection
- **Collection:** `jarvis_rag`
- **Vector size:** 1024 (mxbai-embed-large output)
- **Distance:** Cosine
- **Score threshold:** 0.25 (filters low-relevance chunks)
- **Chunks retrieved per query:** 3 (configurable)
-
-### RAM Ceiling
-Each vector = 4KB (1024 dims × float32). With ultron's ~4-6GB available to Qdrant after llama-server:
- Practical ceiling: ~1–1.5M chunks before RAM becomes the bottleneck
- Current corpus: 219 points (early stage)
- Storage on disk: negligible against pivault's 10.83TB
-
-### What Gets Ingested
- Code repositories (your actual codebase)
- Pair-programming conversation history
- Architecture decisions and working commands
- Documentation and URLs (fetched and stripped via beautifulsoup4/httpx)
-
---
-
-## JarvisChat Service (`/etc/systemd/system/jarvischat.service`)
-
-```ini
-[Unit]
-Description=JarvisChat - Local LLM Developer Platform
-After=network.target
-
-[Service]
-Type=simple
-User=root
-WorkingDirectory=/opt/jarvischat
-ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
-Restart=always
-RestartSec=5
-Environment=PYTHONUNBUFFERED=1
-Environment=OLLAMA_BASE=http://192.168.50.108:8081
-Environment=LLAMA_SERVER_BASE=http://192.168.50.108:8081
-
-[Install]
-WantedBy=multi-user.target
-```
-
---
+- Python 3.11+ (tested on 3.13)
+- llama-server running locally or on network (OpenAI-compatible API on port 8081)
+- SearXNG (optional, for web search)

 ## Installation

-### Prerequisites
- Python 3.11+ (tested on 3.13)
- llama.cpp built from source on both jarvis (RPC server) and ultron (llama-server)
- Qdrant running on ultron
- Ollama on ultron (for mxbai-embed-large embeddings)
- SearXNG on jarvis:8888 (optional, for web search)
-
 ### Fresh Install

 ```bash
+# Create directory and venv
 sudo mkdir -p /opt/jarvischat
 sudo chown $USER:$USER /opt/jarvischat
 cd /opt/jarvischat
 python3 -m venv venv
-./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart qdrant-client
+
+# Install dependencies
+./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart
+
+# Set admin PIN before first startup (4 digits)
+export JARVISCHAT_ADMIN_PIN=4827
+
+# Create subdirectories
 mkdir -p templates static
+
+# Copy files
+# (copy all .py files to /opt/jarvischat/)
+# (copy routers/ directory to /opt/jarvischat/)
+# (copy templates/index.html to /opt/jarvischat/templates/)
 ```

-Copy `app.py` to `/opt/jarvischat/` and `index.html` to `/opt/jarvischat/templates/`.
+WARNING: Do not use `1234` as your admin PIN unless you accept weak local security.

-### Bootstrap the PIN
+NOTE: First boot requires `JARVISCHAT_ADMIN_PIN` unless you explicitly opt into insecure fallback with `JARVISCHAT_ALLOW_DEFAULT_PIN=true`.
+
+## Systemd Service
+
+Create `/etc/systemd/system/jarvischat.service`:
+
+```ini
+[Unit]
+Description=JarvisChat - Local Inference Web Interface
+After=network.target
+
+[Service]
+Type=simple
+User=jarvischat
+Group=jarvischat
+WorkingDirectory=/opt/jarvischat
+ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
+Restart=always
+RestartSec=5
+
+[Install]
+WantedBy=multi-user.target
+```

 ```bash
-export JARVISCHAT_ADMIN_PIN=XXXX  # your 4-digit PIN
+sudo systemctl daemon-reload
+sudo systemctl enable jarvischat
+sudo systemctl start jarvischat
 ```

-Or allow the insecure default for testing:
-```bash
-export JARVISCHAT_ALLOW_DEFAULT_PIN=true
-```
+## Memory Commands

-### Environment Variables
+In chat, natural language triggers memory operations:

-| Variable | Default | Description |
-|----------|---------|-------------|
-| `OLLAMA_BASE` | `http://localhost:11434` | Ollama-compatible endpoint (legacy) |
-| `LLAMA_SERVER_BASE` | `http://192.168.50.108:8081` | llama-server OpenAI-compat inference endpoint |
-| `JARVISCHAT_ADMIN_PIN` | (none) | 4-digit admin PIN (required on first boot) |
-| `JARVISCHAT_ALLOW_DEFAULT_PIN` | `false` | Allow insecure default PIN 1234 |
-| `JARVISCHAT_TRUSTED_ORIGINS` | (none) | Comma-separated trusted origins for CSRF |
-| `JARVISCHAT_ALLOWED_CIDRS` | RFC1918 + loopback | Allowed client IP CIDRs |
+| You say | What happens |
+|---------|--------------|
+| "remember that I prefer Rust over Go" | Stores as `preference` |
+| "remember that JarvisChat runs on port 8080" | Stores as `infrastructure` |
+| "note that the deadline is Friday" | Stores as `general` |
+| "forget about the deadline" | Removes matching memories |

---
+Memories are automatically searched based on your message content and injected into the system prompt when relevant.
+
+### Memory Topics
+
+Memories are auto-categorized:
+- `preference` — likes, dislikes, choices
+- `project` — active work, repos, tasks
+- `infrastructure` — servers, services, configs
+- `personal` — name, location, background
+- `general` — everything else

 ## API Endpoints

-### Auth
-| Method | Path | Description |
-|--------|------|-------------|
-| POST | `/api/auth/guest` | Create guest session |
-| POST | `/api/auth/login` | Admin PIN login |
-| POST | `/api/auth/logout` | Revoke session |
-| GET | `/api/auth/session` | Check session status |
-| POST | `/api/auth/heartbeat` | Keep session alive |
+### Completions (OpenAI-compatible)
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| POST | `/v1/chat/completions` | OpenAI-compatible chat (requires Bearer API key) |

 ### Chat & Search
-| Method | Path | Description |
-|--------|------|-------------|
-| POST | `/api/chat` | Streaming chat (SSE) |
-| POST | `/api/search` | Explicit web search via SearXNG |
-| GET | `/api/search/status` | SearXNG health check |

-### Models
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/models` | List available models from llama-server |
-| GET | `/api/ps` | Running models |
-| POST | `/api/show` | Model info |
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| POST | `/api/chat` | Send message (streaming SSE) |
+| POST | `/api/search` | Explicit web search (streaming SSE) |

 ### Memory
-| Method | Path | Description |
-|--------|------|-------------|
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
 | GET | `/api/memories` | List all memories |
 | POST | `/api/memories` | Add memory |
 | PUT | `/api/memories/{rowid}` | Update memory |
 | DELETE | `/api/memories/{rowid}` | Delete memory |
-| GET | `/api/memories/search?q=` | FTS5 search memories |
-| GET | `/api/memories/stats` | Memory statistics |
+| GET | `/api/memories/search?q=term` | Search memories |
+| GET | `/api/memories/stats` | Get counts by topic |
+
+### Models & System
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/models` | List available models |
+| GET | `/api/ps` | List loaded models |
+| POST | `/api/show` | Get model info |
+| GET | `/api/stats` | CPU, RAM, GPU, VRAM stats |
+| GET | `/api/search/status` | SearXNG availability |
+
+### Settings & Profile
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/profile` | Get profile content |
+| PUT | `/api/profile` | Update profile (admin) |
+| GET | `/api/profile/default` | Get default profile |
+| GET | `/api/settings` | Get settings |
+| PUT | `/api/settings` | Update settings (admin) |

 ### Conversations
-| Method | Path | Description |
-|--------|------|-------------|
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
 | GET | `/api/conversations` | List conversations |
 | POST | `/api/conversations` | Create conversation |
-| GET | `/api/conversations/{id}` | Get conversation + messages |
-| PUT | `/api/conversations/{id}` | Update title/model |
+| GET | `/api/conversations/{id}` | Get conversation with messages |
+| PUT | `/api/conversations/{id}` | Update conversation title/model |
 | DELETE | `/api/conversations/{id}` | Delete conversation |
-| DELETE | `/api/conversations` | Delete all conversations |
+| DELETE | `/api/conversations` | Delete ALL conversations |

-### Profile & Settings
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/profile` | Get profile |
-| PUT | `/api/profile` | Update profile |
-| GET | `/api/settings` | Get settings |
-| PUT | `/api/settings` | Update settings |
-| GET | `/api/stats` | CPU/RAM/GPU stats |
+### Presets
+
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/presets` | List presets |
+| POST | `/api/presets` | Create preset |
+| PUT | `/api/presets/{id}` | Update preset |
+| DELETE | `/api/presets/{id}` | Delete preset |

 ### Skills
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/skills` | List all skills |
-| GET | `/api/skills/active` | List enabled skills |
-| PUT | `/api/skills/{key}` | Enable/disable skill |

---
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/skills` | List all skills with state |
+| GET | `/api/skills/active` | List active skills |
+| PUT | `/api/skills/{key}` | Toggle skill enabled (admin) |

-## Memory Commands
+### Auth

-Say these in chat to interact with the memory system:
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| POST | `/api/auth/guest` | Create guest session |
+| POST | `/api/auth/login` | Admin PIN login |
+| POST | `/api/auth/logout` | Revoke session |
+| GET | `/api/auth/session` | Check session validity |
+| POST | `/api/auth/heartbeat` | Extend session TTL |

-| Command | Effect |
-|---------|--------|
-| `remember that [fact]` | Stores fact in FTS5 memory |
-| `please remember [fact]` | Same |
-| `don't forget [fact]` | Same |
-| `forget about [topic]` | Deletes matching memories |
+## Configuration

---
+Settings are stored in the `settings` table and include:

-## Troubleshooting
+- `profile_enabled` — Inject profile into chats (true/false)
+- `search_enabled` — Auto web search (true/false)
+- `memory_enabled` — Memory injection (true/false)
+- `skills_enabled` — Skills framework (true/false)
+- `default_model` — Default inference model
+
+## Testing

-### jC starts but inference is slow or failing
-Check that llama-rpc is running on jarvis and llama-server is connected:
 ```bash
-# On jarvis
-systemctl status llama-rpc
-
-# On ultron — look for "offloaded N/N layers to GPU" in logs
-journalctl -u llama-server -n 50 --no-pager
+./venv/bin/python -m pytest tests/ -v
 ```

-### ultron shows no CPU activity during inference
-Inference is being handled entirely by jarvis GPU via RPC — this is correct and expected. ultron's CPU is only involved for non-offloaded tensors (a small fraction of the model).
-
-### RAG not returning results
-Check Qdrant is up and the collection exists:
-```bash
-curl http://192.168.50.108:6333/collections/jarvis_rag
-```
-Verify `points_count` > 0. If zero, the corpus hasn't been seeded yet.
-
-### jC won't start — PIN bootstrap error
-Set the PIN via environment before first boot:
-```bash
-export JARVISCHAT_ADMIN_PIN=XXXX
-systemctl restart jarvischat
-```
-
-### sqlite3 not found
-Use Python instead:
-```bash
-python3 -c "import sqlite3; print(sqlite3.connect('/opt/jarvischat/jarvischat.db').execute('SELECT * FROM settings').fetchall())"
-```
-
---
-
-## Roadmap
-
-### TODO (Priority Order)
-1. **Tool calling** — read_file/write_file with /opt/jarvischat whitelist, tool_calls dispatch loop
-2. **git_tool** — Gitea integration for commit/push from jC
-3. **Audit logging** — structured audit trail to syslog
-4. SearXNG persistence (DONE ✅)
-5. search+ prefix for explicit search
-6. profile.example.md
-7. Conversation search/filter
-8. Export to markdown
-9. Keyboard shortcuts
-10. Retry button
-11. Source links in responses
-12. Rename conversations
-13. Multiple profiles
-14. KWIC auto-tags
-15. Image input (vision)
-16. btop split-screen integration
-17. Containerize
-18. SearXNG health indicator in UI
-19. check_patch_notes tool
-20. GitLab mirror of llgit repo
-
-### ROADMAP (Longer Horizon)
-
-**(A) Modular refactor** — Split monolithic app.py into routers/, services/, config.py, db.py, auth.py. Prerequisite for everything below.
-
-**(B) RAG ingest/manage UI** — File upload, URL ingest (fetch + strip HTML via beautifulsoup4/httpx, store URL as source metadata for citation), delete chunks/collections.
-
-**(C) Backend config panel** — Switch between Ollama/llama-server, endpoint URLs, model switching, restart — all from the UI without touching config files.
-
-**(D) Response metrics display** — tokens/sec, TTFT, context size, RAG chunks retrieved + scores — visible in the UI per response.
-
-**(E) Response quality feedback** — thumbs/stars/tags per response → feedback corpus → future RLHF dataset.
-
-**(F) IDE integration** — Continue.dev + VS Code, pointed at jC:8080 (not direct to inference endpoint). All IDE traffic — including pair-programming conversations — goes through jC so sessions are persisted and become RAG-worthy content. jC needs FIM request format handling to support inline autocomplete.
-
-**(G) Conversation history export → RAG ingest** — Bulk ingest existing conversation history into Qdrant.
-
-**(H) Fine-tuning pipeline** — LoRA on Mistral-Nemo from feedback corpus (item E).
-
-**(I) Autonomous RAG** — At conversation end, jC self-evaluates the transcript, extracts significant chunks (solved problems, working commands, architectural decisions), and ingests them into Qdrant automatically with metadata (date, conversation_id, reason). jC decides what it needs to remember. Closes the loop.
-
-**(J) Startup hardware/resource self-assessment** — On boot, jC queries ultron for available RAM, Qdrant consumption, and llama-server footprint. Derives dynamic high-water marks for RAG chunk limits, context window sizing, retrieval limits, and eviction thresholds. Writes a living config file. Replaces magic numbers with runtime-negotiated values.
-
-**(K) RAG corpus management** — Weighted LRU eviction with composite score (recency + frequency + content age) + manual pin flag for load-bearing knowledge. Prevents corpus bloat from degrading retrieval quality. Analogous to memcache eviction policy.
-
-**(L) Dual inference model architecture** — Mistral-Nemo-12B on ultron:8081 (general assistant), Qwen2.5-Coder-14B-Q5_K_M on ultron:8082 (code/pair programming). jC selects endpoint based on active model. Only one model hot at a time given ultron's 16GB RAM constraint.
-
---
-
-## Primary Cluster Objectives
-
-1. **Generative AI inference** — Local, private, fast enough to be useful
-2. **Agentic functionality** — Autonomous RAG self-management is the canonical first example. The system acts, not just responds.
-
---
-
-## Repository
-
-```
-ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git
-```
-
-> SSH username is `gitea`, not `git`. Port 1319.
-
---
+All 26 tests use `tmp_path` fixtures + monkeypatched `httpx.AsyncClient.stream`. No external services needed.

 ## License

 MIT
+
+## Repository
+
+Gitea: `ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git`
--- a/app.py
+++ b/app.py
@@ -102,7 +102,7 @@ async def session_auth_middleware(request: Request, call_next):
        "/api/auth/heartbeat", "/api/auth/guest",
    }

-    if path.startswith("/api/") and is_state_changing(request.method):
+    if path.startswith("/api/"):
        if not origin_allowed(request):
            audit_event("origin_check", "denied", ip=ip, role="none",
                        details=f"{request.method} {path}", warning=True)
--- a/auth.py
+++ b/auth.py
@@ -15,10 +15,10 @@ from fastapi.responses import JSONResponse
 from config import SESSION_TIMEOUT_SECONDS, MAX_PIN_ATTEMPTS, PIN_LOCKOUT_SECONDS, RATE_WINDOW_SECONDS
 from db import get_db, get_setting
 from security import (
-    SESSIONS, PIN_ATTEMPTS, SESSION_LOCK, audit_event, get_client_ip,
-    is_ip_allowed, check_rate_limit, rate_policy, origin_allowed,
-    is_state_changing, request_body_limit, read_json_body, hash_pin,
-    customer_error_envelope, log_incident,
+    SESSIONS, PIN_ATTEMPTS, SESSION_LOCK, BODY_LIMIT_DEFAULT_BYTES,
+    audit_event, get_client_ip, is_ip_allowed, check_rate_limit,
+    rate_policy, origin_allowed, is_state_changing, request_body_limit,
+    read_json_body, hash_pin, customer_error_envelope, log_incident,
 )

 log = logging.getLogger("jarvischat")
@@ -146,7 +146,6 @@ async def auth_guest(request: Request):

@router.post("/api/auth/login")
 async def auth_login(request: Request):
-    from security import BODY_LIMIT_DEFAULT_BYTES
    body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
    pin = str(body.get("pin", ""))
    ip = get_client_ip(request)
@@ -183,7 +182,6 @@ async def auth_heartbeat(request: Request):

@router.post("/api/auth/logout")
 async def auth_logout(request: Request):
-    from security import BODY_LIMIT_DEFAULT_BYTES
    ip = get_client_ip(request)
    sid = request.headers.get("x-session-id", "").strip()
    role = "none"
--- a/config.py
+++ b/config.py
@@ -9,11 +9,12 @@ import logging

 log = logging.getLogger("jarvischat")

-VERSION = "v1.8.0"
+VERSION = "v1.8.5"
 OLLAMA_BASE = os.environ.get("OLLAMA_BASE", "http://localhost:11434")
 LLAMA_SERVER_BASE = os.environ.get("LLAMA_SERVER_BASE", "http://192.168.50.108:8081")
 SEARXNG_BASE = "http://localhost:8888"
 DEFAULT_MODEL = "llama3.1:latest"
+COMPLETIONS_API_KEY = os.environ.get("JARVISCHAT_COMPLETIONS_API_KEY", "jc-sk-" + os.urandom(24).hex())

 # --- Auth ---
 SESSION_TIMEOUT_SECONDS = 90
--- a/memory.py
+++ b/memory.py
@@ -62,7 +62,13 @@ def search_memories(query: str, limit: int = 5) -> list:
    if not words:
        db.close()
        return []
-    safe_query = " OR ".join(word + "*" for word in words[:10])
+    escaped = []
+    for word in words[:10]:
+        if word.upper() in {"AND", "OR", "NOT", "NEAR"}:
+            escaped.append(f'"{word}"*')
+        else:
+            escaped.append(word + "*")
+    safe_query = " OR ".join(escaped)
    try:
        rows = db.execute(
            "SELECT rowid, fact, topic, source, created_at, bm25(memories) AS rank "
--- a/rag.py
+++ b/rag.py
@@ -12,7 +12,7 @@ from config import MAX_SKILL_PROMPT_CHARS
 log = logging.getLogger("jarvischat")

 QDRANT_URL = "http://192.168.50.108:6333"
-EMBED_URL = "http://192.168.50.108:11434"
+EMBED_URL = "http://192.168.50.210:11434"
 EMBED_MODEL = "mxbai-embed-large"
 RAG_COLLECTION = "jarvis_rag"
 RAG_SCORE_THRESHOLD = 0.25
@@ -65,7 +65,7 @@ async def build_system_prompt(db, extra_prompt: str = "", user_message: str = ""
                rag_lines = [r["payload"]["text"] for r in rag_results if r["score"] > RAG_SCORE_THRESHOLD]
                if rag_lines:
                    parts.append("## Retrieved Context\n" + "\n\n---\n\n".join(rag_lines))
-                    log.warning(f"RAG injected {len(rag_lines)} chunks into context")
+                    log.info(f"RAG injected {len(rag_lines)} chunks into context")
        except Exception as e:
            log.warning(f"RAG injection error: {e}")

--- a/readme.md
+++ b/readme.md
@@ -1,355 +0,0 @@
-# ⚡ JarvisChat v1.7.8
-
-![screenshot](docs/images/screenshot.png)
-
-**A lightweight Ollama coding companion with persistent memory, web search, and real-time system monitoring.**
-
-Built with FastAPI + SQLite + Jinja2. Runs on Python 3.13. No Docker required.
-
-Developer wiki: [docs/wiki/Home.md](docs/wiki/Home.md)
-
-Core architecture deep-dive: [docs/wiki/Developer-Architecture.md](docs/wiki/Developer-Architecture.md)
-
-## Security Scope Disclaimer
-
-JarvisChat is designed for local and home-lab use (same host or trusted LAN).
-
-JarvisChat may technically work with frontier or commercial AI endpoints, but the author does not recommend or support that usage.
-
-Supported deployments are contained local/home-lab environments.
-
-By default, API access is limited to loopback + private LAN CIDRs. You can override with `JARVISCHAT_ALLOWED_CIDRS` (comma-separated CIDRs) and optionally trust reverse-proxy forwarding with `JARVISCHAT_TRUST_X_FORWARDED_FOR=true`.
-
-If you deploy outside a trusted local subnet, your risk profile changes significantly and the default protections here may be insufficient.
-
-Use at your own risk. No warranty is provided for Internet-exposed deployments.
-
-## What's New in v1.7.x
-
- **Security hardening suite completed** - request rate limits, payload caps, settings allowlist, safe error envelopes, and LAN CIDR gate controls
- **Customer-safe incident handling** - client-facing errors include support-friendly incident keys while full traces remain in server logs
- **Streaming and regression test expansion** - automated coverage for SSE chat/search paths, memory remember/forget command handling, and auth/guardrail behavior
- **Skills framework (Phase 1)** - built-in local skill registry with per-skill enable controls, API endpoints, and bounded prompt injection
- **Skills WebUX controls** - Settings modal now includes a master skills toggle and per-skill toggles for admin users
-
-## What's New in v1.6.x
-
- **Guest/admin capability split** - guest chat by default with 4-digit admin PIN for advanced or destructive operations
- **Session + lockout controls** - session lifecycle endpoints, heartbeat, logout/revoke behavior, failed PIN lockout protections, and auth audit events
- **Browser request protections** - strict origin checks for state-changing requests and admin-only write enforcement
- **Unsafe link protection** - outbound search links sanitized to allow only http/https absolute URLs
- **Operational stability fixes** - safer first-boot PIN policy handling and memory-search tokenization fix for punctuation/FTS edge cases
-
-## What's New in v1.5.0
-
- **Explicit Web Search Button** — 🔍 button next to SEND forces a web search, bypassing model uncertainty detection
- **Orange Search Styling** — Search results, WEB badge, and search button share consistent orange color scheme
- **Expanded Refusal Patterns** — Added "As an AI model", "based on my training data", "I don't have the capability"
- **Code cleanup** — Removed unused `JSONResponse` import and dead `raw_results_md` variable
- **Bug fixes** — Replaced bare `except` clauses with `except Exception`; corrected `add_memory()` return type to `int | None`; updated `TemplateResponse` call to Starlette's current API signature
-
-## What's New in v1.4.0
-
- **FTS5 Memory System**: Say "remember that..." to store facts — they're automatically retrieved by relevance and injected into context
- **Forget Command**: Say "forget about..." to remove memories
- **Memory Toggle**: Enable/disable memory injection from topbar or settings
- **Multi-file Structure**: Backend and frontend separated for easier maintenance
-
-## Features
-
- **Persistent Memory** — SQLite FTS5 full-text search for fast, relevant memory retrieval
- **Web Search** — SearXNG integration for automatic web lookups when the model is uncertain
- **Explicit Search** — 🔍 button to force web search without waiting for model uncertainty
- **Profile Injection** — Custom system prompt injected into every conversation
- **System Presets** — Save and switch between different system prompts
- **Real-time Stats** — CPU, RAM, GPU, VRAM monitoring in sidebar
- **Token Thermometer** — Visual context window usage indicator
- **Streaming Responses** — Server-sent events for real-time token display
- **Conversation History** — SQLite-backed chat persistence with mass-delete option
- **Model Switching** — Change Ollama models on the fly
-
-## Current WiP (Prioritized)
-
-Canonical backlog: [docs/wiki/current-wip.md](docs/wiki/current-wip.md)
-
-Scope boundary: local-first (same-host Ollama), optional RFC1918 LAN endpoints, no public Internet AI endpoints by default.
-
-Total identified items: 27
-
-Top 10 (brief):
-
-1. P0 [DONE]: Add auth for write/admin endpoints
-2. P0 [DONE]: Add CSRF/origin protection for state-changing requests
-3. P0 [DONE]: Block unsafe URL schemes in rendered links
-4. P0 [DONE]: Add rate limiting and request size limits
-5. P1 [DONE]: Restrict `/api/settings` updates to allowlisted keys
-6. P1: Add pagination + hard caps for list APIs
-7. P1 [DONE]: Replace raw exception leakage with safe client errors
-8. P1 [DONE]: Add automated tests for streaming/search/memory paths
-9. P2 [DONE]: Implement MCP-style skills/tool-call framework
-10. P2: Implement heartbeat/check-in scheduler + summary endpoint
-
-Item 1 executive summary: keep guest mode for conversational chat, require 4-digit admin PIN for advanced/destructive actions, and enforce local/LAN-only backend policy by default.
-
-Implementation status: complete (guest session by default + admin unlock + admin-only write enforcement + origin checks + safe-link sanitization + audit logging + rate/payload guardrails + capability tests).
-
-## TODO
-
-1. ~~Verify SearXNG and Docker services persist across reboots~~
-2. Conversation search/filter by keyword
-3. Export conversation to markdown/text
-4. Keyboard shortcuts (Ctrl+N new chat, Ctrl+Enter send)
-5. Retry button on assistant messages
-6. Source links — clickable links when search used
-7. Allow conversation renaming
-8. Multiple profiles — coding/sysadmin/general
-9. Auto-generate conversation tags (client-side KWIC, top 5, filterable badges)
-10. Image input support — pull vision model, file input/drag-drop, base64 encode, pass `images` array to Ollama `/api/chat`
-11. Split-screen option for btop display
-12. Skills as markdown files — `/opt/jarvischat/skills/`, YAML frontmatter + instructions, injected into context for tool calls
-13. Heartbeats / proactive check-ins — cron + endpoint for daily briefings, HA anomaly alerts
-14. Model info button — (i) icon next to Model dropdown, shows div with model description, last updated date, best-use purpose
-15. Set default model — toggle any model as the default selection
-16. Hide/remove model from list — exclude models from dropdown
-17. Update model function — trigger `ollama pull` for selected model from UI
-18. Add mouseover tooltip to SEND button
-19. Add preflight validation for required model/preset selection and show a clear warning before send to prevent avoidable timeout loops
-
-## File Structure
-
-```
-/opt/jarvischat/
-├── app.py              # FastAPI backend
-├── jarvischat.db       # SQLite database (auto-created)
-├── static/
-│   └── logo.png        # Logo image (optional)
-└── templates/
-    └── index.html      # Frontend
-```
-
-## Requirements
-
- Python 3.11+ (tested on 3.13)
- Ollama running locally or on network
- SearXNG (optional, for web search)
-
-## Installation
-
-### Fresh Install
-
-```bash
-# Create directory and venv
-sudo mkdir -p /opt/jarvischat
-sudo chown $USER:$USER /opt/jarvischat
-cd /opt/jarvischat
-python3 -m venv venv
-
-# Install dependencies
-./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart
-
-# Set admin PIN before first startup (4 digits)
-export JARVISCHAT_ADMIN_PIN=4827
-
-# Create subdirectories
-mkdir -p templates static
-
-# Copy files
-# (copy app.py to /opt/jarvischat/)
-# (copy index.html to /opt/jarvischat/templates/)
-# (copy logo.png to /opt/jarvischat/static/ — optional)
-```
-
-WARNING: Do not use `1234` as your admin PIN unless you accept weak local security.
-
-NOTE: First boot now requires `JARVISCHAT_ADMIN_PIN` unless you explicitly opt into insecure fallback with `JARVISCHAT_ALLOW_DEFAULT_PIN=true`.
-
-### Upgrading from v1.4.x
-
-```bash
-cd /opt/jarvischat
-
-# Backup
-cp app.py app.py.bak
-cp templates/index.html templates/index.html.bak
-
-# Copy new files
-# (copy app.py, replacing old version)
-# (copy index.html to templates/)
-
-# Restart
-sudo systemctl restart jarvischat
-```
-
-## Systemd Service
-
-Create `/etc/systemd/system/jarvischat.service`:
-
-```ini
-[Unit]
-Description=JarvisChat - Local Ollama Web Interface
-After=network.target
-
-[Service]
-Type=simple
-User=jarvischat
-Group=jarvischat
-WorkingDirectory=/opt/jarvischat
-ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
-Restart=always
-RestartSec=5
-
-[Install]
-WantedBy=multi-user.target
-```
-
-```bash
-sudo systemctl daemon-reload
-sudo systemctl enable jarvischat
-sudo systemctl start jarvischat
-```
-
-## Memory Commands
-
-In chat, natural language triggers memory operations:
-
-| You say | What happens |
-|---------|--------------|
-| "remember that I prefer Rust over Go" | Stores as `preference` |
-| "remember that JarvisChat runs on port 8080" | Stores as `infrastructure` |
-| "note that the deadline is Friday" | Stores as `general` |
-| "forget about the deadline" | Removes matching memories |
-
-Memories are automatically searched based on your message content and injected into the system prompt when relevant.
-
-### Memory Topics
-
-Memories are auto-categorized:
- `preference` — likes, dislikes, choices
- `project` — active work, repos, tasks
- `infrastructure` — servers, services, configs
- `personal` — name, location, background
- `general` — everything else
-
-## API Endpoints
-
-### Memory
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/memories` | List all memories |
-| POST | `/api/memories` | Add memory `{"fact": "...", "topic": "general"}` |
-| DELETE | `/api/memories/{rowid}` | Delete memory by ID |
-| GET | `/api/memories/search?q=term` | Search memories |
-| GET | `/api/memories/stats` | Get counts by topic |
-
-### Chat & Models
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/models` | List available Ollama models |
-| POST | `/api/chat` | Send message (streaming SSE) |
-| POST | `/api/search` | Explicit web search (streaming SSE) |
-| POST | `/api/show` | Get model info (context size) |
-| GET | `/api/ps` | Get running models |
-
-### Settings & Profile
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/profile` | Get profile content |
-| PUT | `/api/profile` | Update profile |
-| GET | `/api/profile/default` | Get default profile |
-| GET | `/api/settings` | Get settings |
-| PUT | `/api/settings` | Update settings |
-
-### Conversations
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/conversations` | List conversations |
-| GET | `/api/conversations/{id}` | Get conversation with messages |
-| DELETE | `/api/conversations/{id}` | Delete conversation |
-| DELETE | `/api/conversations` | Delete ALL conversations |
-
-### Presets
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/presets` | List presets |
-| POST | `/api/presets` | Create preset |
-| PUT | `/api/presets/{id}` | Update preset |
-| DELETE | `/api/presets/{id}` | Delete preset |
-
-### System
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/stats` | CPU, RAM, GPU, VRAM stats |
-| GET | `/api/search/status` | SearXNG availability |
-
-## Configuration
-
-Settings are stored in the `settings` table and include:
-
- `profile_enabled` — Inject profile into chats (true/false)
- `search_enabled` — Auto web search (true/false)
- `memory_enabled` — Memory injection (true/false)
- `default_model` — Default Ollama model
- `searxng_url` — SearXNG instance URL (default: `http://localhost:8888`)
-
-## Testing Memory
-
-```bash
-# Add a memory via API
-curl -X POST http://jarvis:8080/api/memories \
-  -H "Content-Type: application/json" \
-  -d '{"fact": "User prefers native installs over Docker", "topic": "preference"}'
-
-# Search memories
-curl "http://jarvis:8080/api/memories/search?q=docker"
-
-# List all memories
-curl http://jarvis:8080/api/memories
-
-# Get stats
-curl http://jarvis:8080/api/memories/stats
-```
-
-Or in chat:
-1. Say "remember that I hate YAML"
-2. Later ask "what markup languages should I avoid?"
-3. JarvisChat will inject the YAML preference into context
-
-## Troubleshooting
-
-### Service won't start
-
-Check logs:
-```bash
-journalctl -u jarvischat -n 50 --no-pager
-```
-
-Common issues:
- Missing `jinja2`: `./venv/bin/pip install jinja2`
- Missing `templates/` directory
- Wrong permissions on `/opt/jarvischat`
-
-### Memory not working
-
-1. Check memory is enabled (🧠 MEM ON in topbar)
-2. Verify memories exist: `curl http://jarvis:8080/api/memories`
-3. Check FTS5 table: `sqlite3 jarvischat.db "SELECT * FROM memories_fts;"`
-
-### Web search not working
-
-1. Verify SearXNG is running: `curl http://localhost:8888/search?q=test&format=json`
-2. Check search status: `curl http://jarvis:8080/api/search/status`
-3. Ensure JSON format is enabled in SearXNG settings
-
-## License
-
-MIT
-
-## Repository
-
-Gitea: `ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git`
--- a/routers/chat.py
+++ b/routers/chat.py
@@ -26,7 +26,7 @@ def parse_llama_stream_chunk(line: str) -> tuple:
    if line.startswith("data: "):
        line = line[6:]
    if line.strip() == "[DONE]":
-        return None, True, {}
+        return None, True, {}, []
    try:
        chunk = json.loads(line)
        choices = chunk.get("choices", [])
@@ -35,10 +35,17 @@ def parse_llama_stream_chunk(line: str) -> tuple:
            token = delta.get("content")
            finish = choices[0].get("finish_reason")
            stats = {}
+            logprobs_list = []
+            logprobs_info = choices[0].get("logprobs")
+            if logprobs_info:
+                content_logprobs = logprobs_info.get("content", [])
+                for entry in content_logprobs:
+                    if "logprob" in entry:
+                        logprobs_list.append({"logprob": entry["logprob"]})
            if finish == "stop":
                usage = chunk.get("usage", {})
                stats["tokens_per_sec"] = usage.get("tokens_per_second", 0.0)
-            return token, finish == "stop", stats
+            return token, finish == "stop", stats, logprobs_list
        if "message" in chunk and "content" in chunk["message"]:
            token = chunk["message"]["content"]
            done = chunk.get("done", False)
@@ -47,10 +54,10 @@ def parse_llama_stream_chunk(line: str) -> tuple:
                eval_count = chunk.get("eval_count", 0)
                eval_duration = chunk.get("eval_duration", 0)
                stats["tokens_per_sec"] = (eval_count / (eval_duration / 1e9)) if eval_duration > 0 else 0
-            return token, done, stats
+            return token, done, stats, []
    except json.JSONDecodeError:
        pass
-    return None, False, {}
+    return None, False, {}, []


@router.post("/api/chat")
@@ -97,7 +104,7 @@ async def chat(request: Request):
    for row in history_rows:
        messages.append({"role": row["role"], "content": row["content"]})

-    ollama_payload = {"model": model, "messages": messages, "stream": True}
+    upstream_payload = {"model": model, "messages": messages, "stream": True, "logprobs": True}

    async def stream_response():
        full_response = []
@@ -111,12 +118,14 @@ async def chat(request: Request):
            try:
                async with client.stream(
                    "POST", f"{LLAMA_SERVER_BASE}/v1/chat/completions",
-                    json=ollama_payload,
+                    json=upstream_payload,
                    timeout=httpx.Timeout(300.0, connect=10.0),
                ) as resp:
                    async for line in resp.aiter_lines():
                        if line.strip():
-                            token, done, stats = parse_llama_stream_chunk(line)
+                            token, done, stats, chunk_logprobs = parse_llama_stream_chunk(line)
+                            if chunk_logprobs:
+                                all_logprobs.extend(chunk_logprobs)
                            if token:
                                full_response.append(token)
                                yield f"data: {json.dumps({'token': token, 'conversation_id': conv_id})}\n\n"
@@ -153,7 +162,7 @@ async def chat(request: Request):
                        ) as resp2:
                            async for line in resp2.aiter_lines():
                                if line.strip():
-                                    token2, done2, _ = parse_llama_stream_chunk(line)
+                                    token2, done2, _, _ = parse_llama_stream_chunk(line)
                                    if token2:
                                        augmented_response.append(token2)
                                    if done2:
@@ -194,9 +203,9 @@ async def chat(request: Request):
            except httpx.RemoteProtocolError:
                pass
            except httpx.ConnectError:
-                yield f"data: {json.dumps({'error': 'Cannot connect to Ollama. Is it running?'})}\n\n"
+                yield f"data: {json.dumps({'error': 'Cannot connect to inference server. Is it running?'})}\n\n"
            except Exception as e:
-                incident_key = log_incident("chat_stream", message="Ollama stream failure during chat response",
+                incident_key = log_incident("chat_stream", message="Inference stream failure during chat response",
                                            request=request, exc=e)
                yield f"data: {json.dumps({'error': 'Chat response generation failed before completion. Use the incident key for support lookup.', 'error_key': incident_key})}\n\n"

--- a/routers/completions.py
+++ b/routers/completions.py
@@ -178,7 +178,7 @@ async def _stream_chat(payload: dict, model: str, conv_id: str, request: Request
                async for line in resp.aiter_lines():
                    if not line.strip():
                        continue
-                    token, done, _ = parse_llama_stream_chunk(line)
+                    token, done, _, _ = parse_llama_stream_chunk(line)
                    if token:
                        full_response.append(token)
                        yield _build_openai_chunk(token, model, conv_id)
@@ -222,7 +222,7 @@ async def _blocking_chat(payload: dict, model: str, conv_id: str, request: Reque
                async for line in resp.aiter_lines():
                    if not line.strip():
                        continue
-                    token, done, _ = parse_llama_stream_chunk(line)
+                    token, done, _, _ = parse_llama_stream_chunk(line)
                    if token:
                        full_response.append(token)
                    if done:
--- a/routers/models.py
+++ b/routers/models.py
@@ -8,7 +8,7 @@ import httpx
 import psutil
 from fastapi import APIRouter, HTTPException, Request

-from config import OLLAMA_BASE
+from config import LLAMA_SERVER_BASE
 from gpu import get_gpu_stats
 from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES

@@ -20,34 +20,33 @@ router = APIRouter()
 async def list_models():
    async with httpx.AsyncClient() as client:
        try:
-            resp = await client.get(f"{OLLAMA_BASE}/v1/models", timeout=10)
+            resp = await client.get(f"{LLAMA_SERVER_BASE}/v1/models", timeout=10)
            data = resp.json()
            models = [{"name": m["id"], "model": m["id"]} for m in data.get("data", [])]
            return {"models": models}
        except httpx.ConnectError:
-            raise HTTPException(status_code=502, detail="Cannot connect to llama-server.")
+            raise HTTPException(status_code=502, detail="Cannot connect to inference server.")


@router.get("/api/ps")
 async def running_models():
    async with httpx.AsyncClient() as client:
        try:
-            resp = await client.get(f"{OLLAMA_BASE}/api/ps", timeout=10)
+            resp = await client.get(f"{LLAMA_SERVER_BASE}/v1/models", timeout=10)
            return resp.json()
        except httpx.ConnectError:
-            raise HTTPException(status_code=502, detail="Cannot connect to Ollama.")
+            raise HTTPException(status_code=502, detail="Cannot connect to inference server.")


@router.post("/api/show")
 async def show_model(request: Request):
-    from security import BODY_LIMIT_DEFAULT_BYTES
    body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
    async with httpx.AsyncClient() as client:
        try:
-            resp = await client.post(f"{OLLAMA_BASE}/api/show", json=body, timeout=10)
+            resp = await client.post(f"{LLAMA_SERVER_BASE}/api/show", json=body, timeout=10)
            return resp.json()
        except httpx.ConnectError:
-            raise HTTPException(status_code=502, detail="Cannot connect to Ollama.")
+            raise HTTPException(status_code=502, detail="Cannot connect to inference server.")


@router.get("/api/stats")
--- a/routers/search_route.py
+++ b/routers/search_route.py
@@ -35,14 +35,14 @@ async def explicit_search(request: Request):

    if not conv_id:
        conv_id = str(uuid.uuid4())
-        title = f"🔍 {query[:70]}..." if len(query) > 70 else f"🔍 {query}"
+        title = query[:70] + "..." if len(query) > 70 else query
        db.execute("INSERT INTO conversations (id, title, model, created_at, updated_at) VALUES (?, ?, ?, ?, ?)",
                   (conv_id, title, model, now, now))
    else:
        db.execute("UPDATE conversations SET updated_at = ? WHERE id = ?", (now, conv_id))

    db.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
-               (conv_id, "user", f"🔍 {query}", now))
+               (conv_id, "user", query, now))
    db.commit()
    db.close()

@@ -80,7 +80,7 @@ async def explicit_search(request: Request):
                ) as resp:
                    async for line in resp.aiter_lines():
                        if line.strip():
-                            token, done, _ = parse_llama_stream_chunk(line)
+                            token, done, _, _ = parse_llama_stream_chunk(line)
                            if token:
                                full_response.append(token)
                                yield f"data: {json.dumps({'token': token, 'conversation_id': conv_id})}\n\n"
@@ -102,7 +102,6 @@ async def explicit_search(request: Request):
        db2.commit()
        db2.close()

-        yield f"data: {json.dumps({'raw_results': results, 'conversation_id': conv_id})}\n\n"
        yield f"data: {json.dumps({'done': True, 'conversation_id': conv_id, 'searched': True})}\n\n"

    return StreamingResponse(stream_search(), media_type="text/event-stream")
--- a/search.py
+++ b/search.py
@@ -80,16 +80,13 @@ def format_direct_answer(question: str, results: list) -> str:

 def extract_search_query(user_message: str) -> str:
    query = user_message.strip()
-    if re.search(r"temperature|weather", query, re.IGNORECASE):
-        query = re.sub(r"^what('?s| is) the ", "", query, flags=re.IGNORECASE) + " right now degrees"
-    if re.search(r"price|spot price", query, re.IGNORECASE):
-        query = re.sub(r"^(what('?s| is)|can you tell me) the ", "", query, flags=re.IGNORECASE) + " today USD"
-    query = re.sub(
-        r"^(what|who|where|when|why|how|is|are|can|could|would|should|do|does|did)\s+",
-        "", query, flags=re.IGNORECASE,
-    )
-    query = re.sub(r"[?!.]+$", "", query)
-    return query[:100].strip() or user_message[:100]
+    weather_lead = re.match(r"^(?:what('?s| is) the\s+)?(?:weather|temperature|forecast)\s+(?:in\s+|for\s+)?(.+)", query, re.IGNORECASE)
+    if weather_lead:
+        return (weather_lead.group(2) + " weather").strip()[:100]
+    price_lead = re.match(r"^(?:what('?s| is| are)\s+)?(?:the\s+)?(?:price|spot price)\s+(?:of\s+|for\s+)?(.+)", query, re.IGNORECASE)
+    if price_lead:
+        return (price_lead.group(2) + " price today USD").strip()[:100]
+    return query[:100]


 async def query_searxng(query: str, max_results: int = 5) -> list:
--- a/security.py
+++ b/security.py
@@ -156,7 +156,7 @@ def origin_allowed(request: Request) -> bool:
        parsed = urlparse(referer)
        ref_origin = f"{parsed.scheme}://{parsed.netloc}".rstrip("/")
        return ref_origin == expected_origin or ref_origin in TRUSTED_ORIGINS
-    return True
+    return False


 def is_state_changing(method: str) -> bool:
--- a/templates/index.html
+++ b/templates/index.html
@@ -47,13 +47,12 @@ body { font-family: var(--font-body); background: var(--bg-primary); color: var(
 .delete-all-btn { padding: 10px 12px; background: transparent; border: 1px solid var(--danger); border-radius: var(--radius); color: var(--danger); font-size: 14px; cursor: pointer; transition: all 0.2s; }
 .delete-all-btn:hover { background: var(--danger); color: #fff; }
 .conversation-list { flex: 1; overflow-y: auto; padding: 8px; }
-.conv-item { padding: 10px 12px; border-radius: var(--radius); cursor: pointer; margin-bottom: 2px; display: flex; justify-content: space-between; align-items: center; transition: background 0.15s; font-size: 13px; color: var(--text-secondary); }
+.conv-item { padding: 10px 12px; border-radius: var(--radius); cursor: pointer; margin-bottom: 2px; display: flex; align-items: center; gap: 8px; transition: background 0.15s; font-size: 13px; color: var(--text-secondary); }
 .conv-item:hover { background: var(--bg-hover); color: var(--text-primary); }
 .conv-item.active { background: var(--bg-tertiary); color: var(--text-primary); }
-.conv-item .conv-title { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; flex: 1; }
-.conv-item .conv-delete { opacity: 0; color: var(--danger); cursor: pointer; padding: 2px 6px; font-size: 16px; }
-.conv-item:hover .conv-delete { opacity: 0.7; }
-.conv-item .conv-delete:hover { opacity: 1; }
+.conv-item .conv-trash { color: var(--text-muted); cursor: pointer; padding: 2px 2px; font-size: 15px; flex-shrink: 0; transition: color 0.15s; }
+.conv-item .conv-trash:hover { opacity: 1; color: var(--danger); }
+.conv-item .conv-title { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; flex: 1; min-width: 0; }
 .sidebar-footer { padding: 12px 16px; border-top: 1px solid var(--border); font-size: 11px; color: var(--text-muted); font-family: var(--font-mono); }
 .sidebar-footer .status-row { display: flex; align-items: center; gap: 8px; margin-bottom: 4px; }
 .stats-panel { margin-top: 10px; padding-top: 10px; border-top: 1px solid var(--border); }
@@ -983,8 +982,7 @@ async function loadConversations() {
        convs.forEach(c => {
            const div = document.createElement('div');
            div.className = 'conv-item' + (c.id === currentConvId ? ' active' : '');
-            const delBtn = currentRole === 'admin' ? `<span class="conv-delete" onclick="event.stopPropagation();deleteConversation('${c.id}')">×</span>` : '';
-            div.innerHTML = `<span class="conv-title" onclick="loadConversation('${c.id}')">${c.title}</span>${delBtn}`;
+            div.innerHTML = `<span class="conv-trash" onclick="event.stopPropagation();deleteConversation('${c.id}')" title="Delete conversation">🗑</span><span class="conv-title" onclick="loadConversation('${c.id}')">${c.title}</span>`;
            list.appendChild(div);
        });
    } catch(e) {}
--- a/tests/test_auth_capabilities.py
+++ b/tests/test_auth_capabilities.py
@@ -3,16 +3,18 @@ from pathlib import Path

 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import db
+from security import SESSIONS, PIN_ATTEMPTS


 def make_client(tmp_path: Path) -> TestClient:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-test.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.init_db()
-    return TestClient(app_module.app)
+    db.DB_PATH = tmp_path / "jarvischat-test.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    db.init_db()
+    return TestClient(app.app)


 def test_guest_read_only_admin_write_blocked(tmp_path: Path):
@@ -20,7 +22,7 @@ def test_guest_read_only_admin_write_blocked(tmp_path: Path):
        guest = client.post("/api/auth/guest", headers={"Origin": "http://testserver"})
        assert guest.status_code == 200
        sid = guest.json()["session_id"]
-        headers = {"X-Session-ID": sid}
+        headers = {"X-Session-ID": sid, "Origin": "http://testserver"}

        read_resp = client.get("/api/memories", headers=headers)
        assert read_resp.status_code == 200
@@ -74,5 +76,5 @@ def test_logout_revokes_session(tmp_path: Path):
        logout = client.post("/api/auth/logout", headers=headers)
        assert logout.status_code == 200

-        after = client.get("/api/memories", headers={"X-Session-ID": sid})
+        after = client.get("/api/memories", headers={"X-Session-ID": sid, "Origin": "http://testserver"})
        assert after.status_code == 401
--- a/tests/test_chat_streaming_and_memory_paths.py
+++ b/tests/test_chat_streaming_and_memory_paths.py
@@ -2,19 +2,24 @@ import json
 import os
 from pathlib import Path

+import httpx
 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import config
+import db
+import routers.chat
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS


 def make_client(tmp_path: Path) -> TestClient:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-streaming.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.RATE_EVENTS.clear()
-    app_module.init_db()
-    return TestClient(app_module.app, raise_server_exceptions=False)
+    db.DB_PATH = tmp_path / "jarvischat-streaming.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)


 def parse_sse_payloads(body: str) -> list[dict]:
@@ -65,11 +70,11 @@ def test_chat_stream_emits_tokens_and_done(tmp_path: Path, monkeypatch):
        def stream_stub(self, method, url, json=None, timeout=None):
            return _MockStreamResponse(events)

-        monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", stream_stub)
+        monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)

        resp = client.post(
            "/api/chat",
-            json={"message": "hello", "model": app_module.DEFAULT_MODEL},
+            json={"message": "hello", "model": config.DEFAULT_MODEL},
            headers=headers,
        )
        assert resp.status_code == 200
@@ -92,7 +97,7 @@ def test_chat_auto_search_trigger_emits_search_events(tmp_path: Path, monkeypatc
        first_stream = _stream_json_lines(
            [
                {
-                    "message": {"content": "I am uncertain."},
+                    "message": {"content": "I don't have current data on that question."},
                    "logprobs": [{"logprob": -5.0}],
                },
                {"done": True, "eval_count": 2, "eval_duration": 1000000000},
@@ -118,12 +123,12 @@ def test_chat_auto_search_trigger_emits_search_events(tmp_path: Path, monkeypatc
                }
            ]

-        monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", stream_stub)
-        monkeypatch.setattr(app_module, "query_searxng", search_stub)
+        monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)
+        monkeypatch.setattr(routers.chat, "query_searxng", search_stub)

        resp = client.post(
            "/api/chat",
-            json={"message": "what is the latest value", "model": app_module.DEFAULT_MODEL},
+            json={"message": "what is the latest value", "model": config.DEFAULT_MODEL},
            headers=headers,
        )
        assert resp.status_code == 200
@@ -153,13 +158,13 @@ def test_memory_command_paths_remember_and_forget(tmp_path: Path, monkeypatch):
        def stream_stub(self, method, url, json=None, timeout=None):
            return _MockStreamResponse(base_stream)

-        monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", stream_stub)
+        monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)

        remember_resp = client.post(
            "/api/chat",
            json={
                "message": "remember that my favorite language is rust",
-                "model": app_module.DEFAULT_MODEL,
+                "model": config.DEFAULT_MODEL,
            },
            headers=headers,
        )
@@ -167,7 +172,7 @@ def test_memory_command_paths_remember_and_forget(tmp_path: Path, monkeypatch):
        remember_events = parse_sse_payloads(remember_resp.text)
        assert any("Remembered" in p.get("token", "") for p in remember_events)

-        memories_after_add = client.get("/api/memories", headers={"X-Session-ID": sid})
+        memories_after_add = client.get("/api/memories", headers={"X-Session-ID": sid, "Origin": "http://testserver"})
        assert memories_after_add.status_code == 200
        assert memories_after_add.json().get("count", 0) >= 1

@@ -175,7 +180,7 @@ def test_memory_command_paths_remember_and_forget(tmp_path: Path, monkeypatch):
            "/api/chat",
            json={
                "message": "forget about my favorite language",
-                "model": app_module.DEFAULT_MODEL,
+                "model": config.DEFAULT_MODEL,
            },
            headers=headers,
        )
@@ -183,6 +188,6 @@ def test_memory_command_paths_remember_and_forget(tmp_path: Path, monkeypatch):
        forget_events = parse_sse_payloads(forget_resp.text)
        assert any("Forgot" in p.get("token", "") for p in forget_events)

-        memories_after_forget = client.get("/api/memories", headers={"X-Session-ID": sid})
+        memories_after_forget = client.get("/api/memories", headers={"X-Session-ID": sid, "Origin": "http://testserver"})
        assert memories_after_forget.status_code == 200
        assert memories_after_forget.json().get("count", 0) == 0
--- a/tests/test_completions.py
+++ b/tests/test_completions.py
@@ -0,0 +1,222 @@
+import json
+import os
+from pathlib import Path
+
+import httpx
+from fastapi.testclient import TestClient
+
+import app
+import config
+import db
+import routers.completions
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-completions.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+TEST_API_KEY = "test-sk-jarvischat-completions"
+
+
+def _auth_headers(extra: dict = None) -> dict:
+    h = {"Authorization": f"Bearer {TEST_API_KEY}", "Content-Type": "application/json", "Origin": "http://testserver"}
+    if extra:
+        h.update(extra)
+    return h
+
+
+class _MockStreamResponse:
+    def __init__(self, lines: list[str]):
+        self._lines = lines
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        return False
+
+    async def aiter_lines(self):
+        for line in self._lines:
+            yield line
+
+
+class _MockAsyncPostResponse:
+    def __init__(self, status_code=200, json_data=None):
+        self.status_code = status_code
+        self._json_data = json_data or {}
+
+    def json(self):
+        return self._json_data
+
+
+def _stream_json_lines(events: list[dict]) -> list[str]:
+    return [json.dumps(event) for event in events]
+
+
+def test_completions_missing_api_key(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "user", "content": "hi"}]},
+        )
+        assert resp.status_code == 401
+
+
+def test_completions_invalid_api_key(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "user", "content": "hi"}]},
+            headers={"Authorization": "Bearer wrong-key", "Origin": "http://testserver"},
+        )
+        assert resp.status_code == 401
+
+
+def test_completions_no_messages(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    with make_client(tmp_path) as client:
+        resp = client.post("/v1/chat/completions", json={}, headers=_auth_headers())
+        assert resp.status_code == 400
+
+
+def test_completions_empty_messages(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    with make_client(tmp_path) as client:
+        resp = client.post("/v1/chat/completions", json={"messages": []}, headers=_auth_headers())
+        assert resp.status_code == 400
+
+
+def test_completions_no_user_message(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "assistant", "content": "hello"}]},
+            headers=_auth_headers(),
+        )
+        assert resp.status_code == 400
+
+
+def test_completions_streaming(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    events = _stream_json_lines([
+        {"choices": [{"delta": {"content": "Hello"}, "logprobs": None}]},
+        {"choices": [{"delta": {"content": " world"}, "logprobs": None}]},
+        {"choices": [{"delta": {}, "finish_reason": "stop"}], "usage": {"tokens_per_second": 15.0}},
+    ])
+
+    call_count = 0
+
+    def stream_stub(self, method, url, json=None, timeout=None):
+        nonlocal call_count
+        call_count += 1
+        return _MockStreamResponse(events)
+
+    monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)
+
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "user", "content": "hi"}], "stream": True},
+            headers=_auth_headers(),
+        )
+        assert resp.status_code == 200
+        body = resp.text
+        assert "data: [DONE]" in body
+        assert "Hello" in body or "world" in body
+        assert "chatcmpl-" in body
+
+
+def test_completions_blocking(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    events = _stream_json_lines([
+        {"choices": [{"delta": {"content": "Hello world"}, "logprobs": None}]},
+        {"choices": [{"delta": {}, "finish_reason": "stop"}], "usage": {}},
+    ])
+
+    def stream_stub(self, method, url, json=None, timeout=None):
+        return _MockStreamResponse(events)
+
+    monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)
+
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "user", "content": "hi"}], "stream": False},
+            headers=_auth_headers(),
+        )
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["object"] == "chat.completion"
+        assert data["choices"][0]["message"]["content"] == "Hello world"
+
+
+def test_completions_fim_passthrough(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    fim_data = {"prompt": "def foo():\n    ", "suffix": "\n    return x", "model": "llama3.1:latest"}
+
+    async def mock_post(self, url, json=None, timeout=None):
+        return _MockAsyncPostResponse(json_data={"choices": [{"text": "pass"}], "usage": {}})
+
+    monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
+
+    with make_client(tmp_path) as client:
+        resp = client.post("/v1/chat/completions", json=fim_data, headers=_auth_headers())
+        assert resp.status_code == 200
+        assert "choices" in resp.json()
+
+
+def test_completions_connect_error_stream(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+
+    def broken_stream(self, method, url, json=None, timeout=None):
+        raise httpx.ConnectError("Connection refused")
+
+    monkeypatch.setattr(httpx.AsyncClient, "stream", broken_stream)
+
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "user", "content": "hi"}], "stream": True},
+            headers=_auth_headers(),
+        )
+        assert resp.status_code == 200
+        assert "connection_error" in resp.text
+
+
+def test_completions_connect_error_blocking(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+
+    def broken_stream(self, method, url, json=None, timeout=None):
+        raise httpx.ConnectError("Connection refused")
+
+    monkeypatch.setattr(httpx.AsyncClient, "stream", broken_stream)
+
+    with make_client(tmp_path) as client:
+        resp = client.post(
+            "/v1/chat/completions",
+            json={"messages": [{"role": "user", "content": "hi"}], "stream": False},
+            headers=_auth_headers(),
+        )
+        assert resp.status_code == 503
+
+
+def test_completions_fim_connect_error(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.completions, "COMPLETIONS_API_KEY", TEST_API_KEY)
+    fim_data = {"prompt": "def foo():", "model": "llama3.1:latest"}
+
+    def broken_post(self, url, json=None, timeout=None):
+        raise httpx.ConnectError("Connection refused")
+
+    monkeypatch.setattr(httpx.AsyncClient, "post", broken_post)
+
+    with make_client(tmp_path) as client:
+        resp = client.post("/v1/chat/completions", json=fim_data, headers=_auth_headers())
+        assert resp.status_code == 503
--- a/tests/test_conversations.py
+++ b/tests/test_conversations.py
@@ -0,0 +1,153 @@
+import os
+from pathlib import Path
+
+from fastapi.testclient import TestClient
+
+import app
+import db
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-conversations.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+def _admin_headers(client: TestClient) -> dict:
+    login = client.post("/api/auth/login", json={"pin": "1234"}, headers={"Origin": "http://testserver"})
+    sid = login.json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def _guest_headers(client: TestClient) -> dict:
+    sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def test_list_conversations_empty(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/conversations", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json() == []
+
+
+def test_create_and_list_conversation(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+
+        create = client.post("/api/conversations", json={"title": "Test Chat", "model": "llama3.1:latest"}, headers=headers)
+        assert create.status_code == 200
+        data = create.json()
+        assert data["title"] == "Test Chat"
+        assert data["model"] == "llama3.1:latest"
+
+        list_resp = client.get("/api/conversations", headers=headers)
+        assert list_resp.status_code == 200
+        convs = list_resp.json()
+        assert len(convs) == 1
+        assert convs[0]["title"] == "Test Chat"
+
+
+def test_get_conversation_returns_messages(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/conversations", json={"title": "My Chat"}, headers=headers)
+        conv_id = create.json()["id"]
+
+        resp = client.get(f"/api/conversations/{conv_id}", headers=headers)
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["conversation"]["id"] == conv_id
+        assert data["messages"] == []
+
+
+def test_get_conversation_not_found(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/conversations/nope", headers=_guest_headers(client))
+        assert resp.status_code == 404
+
+
+def test_update_conversation_title(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/conversations", json={"title": "Old"}, headers=headers)
+        conv_id = create.json()["id"]
+
+        update = client.put(f"/api/conversations/{conv_id}", json={"title": "New Title"}, headers=headers)
+        assert update.status_code == 200
+
+        get = client.get(f"/api/conversations/{conv_id}", headers=headers)
+        assert get.json()["conversation"]["title"] == "New Title"
+
+
+def test_update_conversation_model(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/conversations", json={"title": "Test"}, headers=headers)
+        conv_id = create.json()["id"]
+
+        update = client.put(f"/api/conversations/{conv_id}", json={"model": "qwen2:latest"}, headers=headers)
+        assert update.status_code == 200
+
+        get = client.get(f"/api/conversations/{conv_id}", headers=headers)
+        assert get.json()["conversation"]["model"] == "qwen2:latest"
+
+
+def test_delete_conversation(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/conversations", json={"title": "Delete Me"}, headers=headers)
+        conv_id = create.json()["id"]
+
+        delete = client.delete(f"/api/conversations/{conv_id}", headers=headers)
+        assert delete.status_code == 200
+
+        get = client.get(f"/api/conversations/{conv_id}", headers=_guest_headers(client))
+        assert get.status_code == 404
+
+
+def test_delete_all_conversations(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        client.post("/api/conversations", json={"title": "One"}, headers=headers)
+        client.post("/api/conversations", json={"title": "Two"}, headers=headers)
+
+        delete_all = client.delete("/api/conversations", headers=headers)
+        assert delete_all.status_code == 200
+
+        list_resp = client.get("/api/conversations", headers=_guest_headers(client))
+        assert list_resp.json() == []
+
+
+def test_guest_cannot_create_conversation(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.post("/api/conversations", json={"title": "test"}, headers=_guest_headers(client))
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_update_conversation(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/conversations", json={"title": "Test"}, headers=headers)
+        conv_id = create.json()["id"]
+
+        guest_headers = _guest_headers(client)
+        resp = client.put(f"/api/conversations/{conv_id}", json={"title": "hack"}, headers=guest_headers)
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_delete_conversation(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.delete("/api/conversations/some-id", headers=_guest_headers(client))
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_delete_all(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.delete("/api/conversations", headers=_guest_headers(client))
+        assert resp.status_code == 403
--- a/tests/test_error_envelopes.py
+++ b/tests/test_error_envelopes.py
@@ -1,19 +1,24 @@
 import os
 from pathlib import Path

+import httpx
 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import config
+import db
+import routers.memories
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS


 def make_client(tmp_path: Path) -> TestClient:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-errors.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.RATE_EVENTS.clear()
-    app_module.init_db()
-    return TestClient(app_module.app, raise_server_exceptions=False)
+    db.DB_PATH = tmp_path / "jarvischat-errors.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)


 def test_unhandled_api_exception_returns_friendly_error_with_incident_key(
@@ -23,12 +28,12 @@ def test_unhandled_api_exception_returns_friendly_error_with_incident_key(
        sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
            "session_id"
        ]
-        headers = {"X-Session-ID": sid}
+        headers = {"X-Session-ID": sid, "Origin": "http://testserver"}

        def boom(_topic=None):
            raise RuntimeError("super secret db internals")

-        monkeypatch.setattr(app_module, "get_all_memories", boom)
+        monkeypatch.setattr(routers.memories, "get_all_memories", boom)

        resp = client.get("/api/memories", headers=headers)
        assert resp.status_code == 500
@@ -57,11 +62,11 @@ def test_chat_stream_error_hides_internal_exception_and_emits_incident_key(
        def broken_stream(*args, **kwargs):
            return BrokenStreamContext()

-        monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", broken_stream)
+        monkeypatch.setattr(httpx.AsyncClient, "stream", broken_stream)

        resp = client.post(
            "/api/chat",
-            json={"message": "hello", "model": app_module.DEFAULT_MODEL},
+            json={"message": "hello", "model": config.DEFAULT_MODEL},
            headers=headers,
        )

--- a/tests/test_ip_allowlist.py
+++ b/tests/test_ip_allowlist.py
@@ -3,48 +3,42 @@ from pathlib import Path

 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import db
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS, is_ip_allowed


 def make_client(tmp_path: Path) -> TestClient:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-ip.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.RATE_EVENTS.clear()
-    app_module.init_db()
-    return TestClient(app_module.app)
+    db.DB_PATH = tmp_path / "jarvischat-ip.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app)


 def test_ip_helper_allows_local_defaults():
-    assert app_module.is_ip_allowed("127.0.0.1")
-    assert app_module.is_ip_allowed("192.168.1.10")
-    assert app_module.is_ip_allowed("10.0.0.42")
-    assert app_module.is_ip_allowed("172.16.1.2")
-    assert app_module.is_ip_allowed("testclient")
+    assert is_ip_allowed("127.0.0.1")
+    assert is_ip_allowed("192.168.1.10")
+    assert is_ip_allowed("10.0.0.42")
+    assert is_ip_allowed("172.16.1.2")
+    assert is_ip_allowed("testclient")


 def test_ip_helper_blocks_public_ip():
-    assert not app_module.is_ip_allowed("8.8.8.8")
+    assert not is_ip_allowed("8.8.8.8")


-def test_middleware_blocks_disallowed_ip(tmp_path: Path):
+def test_middleware_blocks_disallowed_ip(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(app, "get_client_ip", lambda _req: "8.8.8.8")
    with make_client(tmp_path) as client:
-        original_get_client_ip = app_module.get_client_ip
-        try:
-            app_module.get_client_ip = lambda _req: "8.8.8.8"
        resp = client.post("/api/auth/guest")
        assert resp.status_code == 403
-        finally:
-            app_module.get_client_ip = original_get_client_ip


-def test_middleware_allows_local_ip(tmp_path: Path):
+def test_middleware_allows_local_ip(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(app, "get_client_ip", lambda _req: "192.168.50.109")
    with make_client(tmp_path) as client:
-        original_get_client_ip = app_module.get_client_ip
-        try:
-            app_module.get_client_ip = lambda _req: "192.168.50.109"
-            resp = client.post("/api/auth/guest")
+        resp = client.post("/api/auth/guest", headers={"Origin": "http://testserver"})
        assert resp.status_code == 200
-        finally:
-            app_module.get_client_ip = original_get_client_ip
--- a/tests/test_memories.py
+++ b/tests/test_memories.py
@@ -0,0 +1,161 @@
+import os
+from pathlib import Path
+
+from fastapi.testclient import TestClient
+
+import app
+import config
+import db
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-memories.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+def _admin_headers(client: TestClient) -> dict:
+    login = client.post("/api/auth/login", json={"pin": "1234"}, headers={"Origin": "http://testserver"})
+    sid = login.json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def _guest_headers(client: TestClient) -> dict:
+    sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def _create_memory(client: TestClient, headers: dict, fact: str = "test fact", topic: str = "general") -> int:
+    resp = client.post("/api/memories", json={"fact": fact, "topic": topic}, headers=headers)
+    assert resp.status_code == 200
+    return resp.json()["rowid"]
+
+
+def test_list_memories_empty(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/memories", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json()["count"] == 0
+
+
+def test_list_memories_by_topic(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        _create_memory(client, headers, "I like Python", "preference")
+        _create_memory(client, headers, "Building a game", "project")
+
+        general = client.get("/api/memories?topic=preference", headers=_guest_headers(client))
+        assert general.json()["count"] == 1
+        assert general.json()["memories"][0]["topic"] == "preference"
+
+
+def test_create_memory_requires_fact(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.post("/api/memories", json={"fact": ""}, headers=_admin_headers(client))
+        assert resp.status_code == 400
+
+
+def test_create_memory_too_long(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        long_fact = "x" * (config.MAX_MEMORY_FACT_CHARS + 1)
+        resp = client.post("/api/memories", json={"fact": long_fact}, headers=_admin_headers(client))
+        assert resp.status_code == 413
+
+
+def test_edit_memory(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        rowid = _create_memory(client, headers, "original fact")
+
+        edit = client.put(f"/api/memories/{rowid}", json={"fact": "updated fact"}, headers=headers)
+        assert edit.status_code == 200
+
+        memories = client.get("/api/memories", headers=_guest_headers(client)).json()
+        assert any(m["fact"] == "updated fact" for m in memories["memories"])
+
+
+def test_edit_memory_not_found(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.put("/api/memories/99999", json={"fact": "nope"}, headers=_admin_headers(client))
+        assert resp.status_code == 404
+
+
+def test_edit_memory_empty_fact(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        rowid = _create_memory(client, headers, "some fact")
+        resp = client.put(f"/api/memories/{rowid}", json={"fact": ""}, headers=headers)
+        assert resp.status_code == 400
+
+
+def test_edit_memory_too_long(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        rowid = _create_memory(client, headers, "some fact")
+        long_fact = "x" * (config.MAX_MEMORY_FACT_CHARS + 1)
+        resp = client.put(f"/api/memories/{rowid}", json={"fact": long_fact}, headers=headers)
+        assert resp.status_code == 413
+
+
+def test_delete_memory_not_found(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.delete("/api/memories/99999", headers=_admin_headers(client))
+        assert resp.status_code == 404
+
+
+def test_search_memories(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        _create_memory(client, headers, "my favorite color is blue", "preference")
+        _create_memory(client, headers, "running nginx on port 443", "infrastructure")
+
+        resp = client.get("/api/memories/search?q=nginx&limit=5", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["count"] >= 1
+        assert any("nginx" in r["fact"] for r in data["results"])
+
+
+def test_search_memories_no_results(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/memories/search?q=xyznonexistent&limit=5", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json()["count"] == 0
+
+
+def test_memory_stats(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        _create_memory(client, headers, "like rust", "preference")
+        _create_memory(client, headers, "like python", "preference")
+        _create_memory(client, headers, "project game", "project")
+
+        resp = client.get("/api/memories/stats", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["total"] == 3
+        assert data["by_topic"]["preference"] == 2
+        assert data["by_topic"]["project"] == 1
+
+
+def test_guest_cannot_create_memory(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.post("/api/memories", json={"fact": "hack"}, headers=_guest_headers(client))
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_edit_memory(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.put("/api/memories/1", json={"fact": "hack"}, headers=_guest_headers(client))
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_delete_memory(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.delete("/api/memories/1", headers=_guest_headers(client))
+        assert resp.status_code == 403
--- a/tests/test_models_router.py
+++ b/tests/test_models_router.py
@@ -0,0 +1,138 @@
+import os
+from pathlib import Path
+
+import httpx
+from fastapi.testclient import TestClient
+
+import app
+import db
+import routers.models
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-models.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+def _guest_headers(client: TestClient) -> dict:
+    sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+class _MockAsyncResponse:
+    """Mock for httpx.AsyncClient.get/post that returns a JSON response."""
+    def __init__(self, status_code=200, json_data=None):
+        self.status_code = status_code
+        self._json_data = json_data or {}
+
+    def json(self):
+        return self._json_data
+
+
+async def _mock_get_models(*args, **kwargs):
+    return _MockAsyncResponse(json_data={
+        "data": [{"id": "llama3.1:latest"}, {"id": "qwen2:latest"}]
+    })
+
+
+async def _mock_get_empty_models(*args, **kwargs):
+    return _MockAsyncResponse(json_data={"data": []})
+
+
+async def _mock_connect_error(*args, **kwargs):
+    raise httpx.ConnectError("Connection refused")
+
+
+async def _mock_show_model(*args, **kwargs):
+    return _MockAsyncResponse(json_data={
+        "modelfile": "FROM llama3.1", "parameters": {}
+    })
+
+
+async def _mock_search_available(*args, **kwargs):
+    return _MockAsyncResponse(status_code=200)
+
+
+async def _mock_search_unavailable(*args, **kwargs):
+    raise httpx.ConnectError("refused")
+
+
+def test_list_models(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "get", _mock_get_models)
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/models", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        models = resp.json()["models"]
+        assert len(models) == 2
+        assert models[0]["name"] == "llama3.1:latest"
+
+
+def test_list_models_connect_error(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "get", _mock_connect_error)
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/models", headers=_guest_headers(client))
+        assert resp.status_code == 502
+
+
+def test_running_models(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "get", _mock_get_models)
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/ps", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert "data" in resp.json()
+
+
+def test_running_models_connect_error(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "get", _mock_connect_error)
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/ps", headers=_guest_headers(client))
+        assert resp.status_code == 502
+
+
+def test_show_model(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "post", _mock_show_model)
+    with make_client(tmp_path) as client:
+        resp = client.post("/api/show", json={"model": "llama3.1:latest"}, headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json()["modelfile"] == "FROM llama3.1"
+
+
+def test_show_model_connect_error(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "post", _mock_connect_error)
+    with make_client(tmp_path) as client:
+        resp = client.post("/api/show", json={"model": "llama3.1:latest"}, headers=_guest_headers(client))
+        assert resp.status_code == 502
+
+
+def test_system_stats(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(routers.models, "get_gpu_stats", lambda: {"gpu_percent": 15, "vram_percent": 30, "available": True})
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/stats", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        data = resp.json()
+        assert "cpu_percent" in data
+        assert "memory_percent" in data
+        assert data["gpu_percent"] == 15
+        assert data["gpu_available"] is True
+
+
+def test_search_status_available(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "get", _mock_search_available)
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/search/status", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json()["available"] is True
+
+
+def test_search_status_unavailable(tmp_path: Path, monkeypatch):
+    monkeypatch.setattr(httpx.AsyncClient, "get", _mock_search_unavailable)
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/search/status", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json()["available"] is False
--- a/tests/test_presets.py
+++ b/tests/test_presets.py
@@ -0,0 +1,128 @@
+import os
+from pathlib import Path
+
+from fastapi.testclient import TestClient
+
+import app
+import db
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-presets.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+def _admin_headers(client: TestClient) -> dict:
+    login = client.post("/api/auth/login", json={"pin": "1234"}, headers={"Origin": "http://testserver"})
+    sid = login.json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def _guest_headers(client: TestClient) -> dict:
+    sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def test_list_presets_returns_defaults(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/presets", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        presets = resp.json()
+        assert len(presets) >= 3
+        names = [p["name"] for p in presets]
+        assert "Coding Companion" in names
+
+
+def test_create_preset(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        resp = client.post("/api/presets", json={"name": "My Preset", "prompt": "You are helpful."}, headers=headers)
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["name"] == "My Preset"
+        assert data["prompt"] == "You are helpful."
+
+        presets = client.get("/api/presets", headers=_guest_headers(client)).json()
+        assert any(p["name"] == "My Preset" for p in presets)
+
+
+def test_create_preset_requires_name_and_prompt(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        resp = client.post("/api/presets", json={"name": "", "prompt": ""}, headers=headers)
+        assert resp.status_code == 400
+
+        resp = client.post("/api/presets", json={"name": "Only Name"}, headers=headers)
+        assert resp.status_code == 400
+
+
+def test_update_preset(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/presets", json={"name": "Old", "prompt": "Old prompt."}, headers=headers)
+        preset_id = create.json()["id"]
+
+        update = client.put(f"/api/presets/{preset_id}", json={"name": "New", "prompt": "New prompt."}, headers=headers)
+        assert update.status_code == 200
+
+        presets = client.get("/api/presets", headers=_guest_headers(client)).json()
+        updated = next(p for p in presets if p["id"] == preset_id)
+        assert updated["name"] == "New"
+        assert updated["prompt"] == "New prompt."
+
+
+def test_update_preset_requires_fields(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        resp = client.put("/api/presets/nope", json={"name": "", "prompt": ""}, headers=headers)
+        assert resp.status_code == 400
+
+
+def test_delete_preset(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        create = client.post("/api/presets", json={"name": "Temp", "prompt": "Temp."}, headers=headers)
+        preset_id = create.json()["id"]
+
+        delete = client.delete(f"/api/presets/{preset_id}", headers=headers)
+        assert delete.status_code == 200
+
+        presets = client.get("/api/presets", headers=_guest_headers(client)).json()
+        assert not any(p["id"] == preset_id for p in presets)
+
+
+def test_delete_default_preset_is_noop(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        presets_before = client.get("/api/presets", headers=_guest_headers(client)).json()
+        default = next(p for p in presets_before if p["is_default"])
+
+        delete = client.delete(f"/api/presets/{default['id']}", headers=headers)
+        assert delete.status_code == 200
+
+        presets_after = client.get("/api/presets", headers=_guest_headers(client)).json()
+        assert any(p["id"] == default["id"] for p in presets_after)
+
+
+def test_guest_cannot_create_preset(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.post("/api/presets", json={"name": "Hack", "prompt": "Hack"}, headers=_guest_headers(client))
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_update_preset(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.put("/api/presets/some-id", json={"name": "Hack", "prompt": "Hack"}, headers=_guest_headers(client))
+        assert resp.status_code == 403
+
+
+def test_guest_cannot_delete_preset(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.delete("/api/presets/some-id", headers=_guest_headers(client))
+        assert resp.status_code == 403
--- a/tests/test_profile.py
+++ b/tests/test_profile.py
@@ -0,0 +1,72 @@
+import os
+from pathlib import Path
+
+from fastapi.testclient import TestClient
+
+import app
+import config
+import db
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-profile.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+def _admin_headers(client: TestClient) -> dict:
+    login = client.post("/api/auth/login", json={"pin": "1234"}, headers={"Origin": "http://testserver"})
+    sid = login.json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def _guest_headers(client: TestClient) -> dict:
+    sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def test_get_profile_returns_content(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/profile", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        data = resp.json()
+        assert "content" in data
+        assert "updated_at" in data
+        assert len(data["content"]) > 0
+
+
+def test_get_default_profile(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.get("/api/profile/default", headers=_guest_headers(client))
+        assert resp.status_code == 200
+        assert resp.json()["content"] == config.DEFAULT_PROFILE
+
+
+def test_update_profile(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        resp = client.put("/api/profile", json={"content": "Custom profile text."}, headers=headers)
+        assert resp.status_code == 200
+        assert "updated_at" in resp.json()
+
+        get_resp = client.get("/api/profile", headers=_guest_headers(client))
+        assert get_resp.json()["content"] == "Custom profile text."
+
+
+def test_update_profile_too_long(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        headers = _admin_headers(client)
+        long_content = "x" * (config.MAX_PROFILE_CHARS + 1)
+        resp = client.put("/api/profile", json={"content": long_content}, headers=headers)
+        assert resp.status_code == 413
+
+
+def test_guest_cannot_update_profile(tmp_path: Path):
+    with make_client(tmp_path) as client:
+        resp = client.put("/api/profile", json={"content": "hack"}, headers=_guest_headers(client))
+        assert resp.status_code == 403
--- a/tests/test_rate_and_payload_guardrails.py
+++ b/tests/test_rate_and_payload_guardrails.py
@@ -4,28 +4,32 @@ from pathlib import Path

 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import config
+import db
+import security
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS


 def make_client(tmp_path: Path) -> TestClient:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-rate.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.RATE_EVENTS.clear()
-    app_module.init_db()
-    return TestClient(app_module.app)
+    db.DB_PATH = tmp_path / "jarvischat-rate.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app)


 def test_stats_rate_limit_hits_429(tmp_path: Path):
-    old_limit = app_module.RL_STATS_PER_WINDOW
-    old_window = app_module.RATE_WINDOW_SECONDS
-    app_module.RL_STATS_PER_WINDOW = 2
-    app_module.RATE_WINDOW_SECONDS = 60
+    old_limit = security.RL_STATS_PER_WINDOW
+    old_window = app.RATE_WINDOW_SECONDS
+    security.RL_STATS_PER_WINDOW = 2
+    app.RATE_WINDOW_SECONDS = 60
    try:
        with make_client(tmp_path) as client:
-            sid = client.post("/api/auth/guest").json()["session_id"]
-            headers = {"X-Session-ID": sid}
+            sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+            headers = {"X-Session-ID": sid, "Origin": "http://testserver"}

            r1 = client.get("/api/stats", headers=headers)
            r2 = client.get("/api/stats", headers=headers)
@@ -35,13 +39,13 @@ def test_stats_rate_limit_hits_429(tmp_path: Path):
            assert r2.status_code == 200
            assert r3.status_code == 429
    finally:
-        app_module.RL_STATS_PER_WINDOW = old_limit
-        app_module.RATE_WINDOW_SECONDS = old_window
+        security.RL_STATS_PER_WINDOW = old_limit
+        app.RATE_WINDOW_SECONDS = old_window


 def test_large_login_payload_rejected_413(tmp_path: Path):
    with make_client(tmp_path) as client:
-        huge_pin = "1" * (app_module.BODY_LIMIT_DEFAULT_BYTES + 100)
+        huge_pin = "1" * (config.BODY_LIMIT_DEFAULT_BYTES + 100)
        resp = client.post(
            "/api/auth/login",
            data=json.dumps({"pin": huge_pin}),
@@ -52,12 +56,12 @@ def test_large_login_payload_rejected_413(tmp_path: Path):

 def test_chat_message_length_rejected_413(tmp_path: Path):
    with make_client(tmp_path) as client:
-        sid = client.post("/api/auth/guest").json()["session_id"]
+        sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
        headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
-        message = "x" * (app_module.MAX_CHAT_MESSAGE_CHARS + 1)
+        message = "x" * (config.MAX_CHAT_MESSAGE_CHARS + 1)
        resp = client.post(
            "/api/chat",
-            json={"message": message, "model": app_module.DEFAULT_MODEL},
+            json={"message": message, "model": config.DEFAULT_MODEL},
            headers=headers,
        )
        assert resp.status_code == 413
@@ -65,12 +69,12 @@ def test_chat_message_length_rejected_413(tmp_path: Path):

 def test_search_query_length_rejected_413(tmp_path: Path):
    with make_client(tmp_path) as client:
-        sid = client.post("/api/auth/guest").json()["session_id"]
+        sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
        headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
-        query = "q" * (app_module.MAX_SEARCH_QUERY_CHARS + 1)
+        query = "q" * (config.MAX_SEARCH_QUERY_CHARS + 1)
        resp = client.post(
            "/api/search",
-            json={"query": query, "model": app_module.DEFAULT_MODEL},
+            json={"query": query, "model": config.DEFAULT_MODEL},
            headers=headers,
        )
        assert resp.status_code == 413
--- a/tests/test_search_route.py
+++ b/tests/test_search_route.py
@@ -0,0 +1,186 @@
+import json
+import os
+from pathlib import Path
+
+import httpx
+from fastapi.testclient import TestClient
+
+import app
+import config
+import db
+import routers.search_route
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS
+
+
+def make_client(tmp_path: Path) -> TestClient:
+    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
+    db.DB_PATH = tmp_path / "jarvischat-search-route.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)
+
+
+def _guest_headers(client: TestClient) -> dict:
+    sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()["session_id"]
+    return {"X-Session-ID": sid, "Origin": "http://testserver"}
+
+
+def parse_sse_payloads(body: str) -> list[dict]:
+    payloads: list[dict] = []
+    for chunk in body.split("\n\n"):
+        chunk = chunk.strip()
+        if not chunk.startswith("data: "):
+            continue
+        raw = chunk[len("data: ") :]
+        payloads.append(json.loads(raw))
+    return payloads
+
+
+class _MockStreamResponse:
+    def __init__(self, lines: list[str]):
+        self._lines = lines
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        return False
+
+    async def aiter_lines(self):
+        for line in self._lines:
+            yield line
+
+
+def _stream_json_lines(events: list[dict]) -> list[str]:
+    return [json.dumps(event) for event in events]
+
+
+def test_explicit_search_with_results(tmp_path: Path, monkeypatch):
+    with make_client(tmp_path) as client:
+        headers = _guest_headers(client)
+
+        async def search_stub(query: str, max_results: int = 5):
+            return [
+                {"title": "Result One", "url": "https://example.com/1", "content": "First result content."},
+                {"title": "Result Two", "url": "https://example.com/2", "content": "Second result content."},
+            ]
+
+        monkeypatch.setattr(routers.search_route, "query_searxng", search_stub)
+
+        events = _stream_json_lines([
+            {"choices": [{"delta": {"content": "Here's what I found"}, "logprobs": None}]},
+            {"choices": [{"delta": {"content": " about your query."}, "logprobs": None}]},
+            {"choices": [{"delta": {}, "finish_reason": "stop"}], "usage": {}},
+        ])
+
+        def stream_stub(self, method, url, json=None, timeout=None):
+            return _MockStreamResponse(events)
+
+        monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)
+
+        resp = client.post(
+            "/api/search",
+            json={"query": "current events", "model": config.DEFAULT_MODEL},
+            headers=headers,
+        )
+        assert resp.status_code == 200
+        payloads = parse_sse_payloads(resp.text)
+
+        assert any(p.get("searching") is True for p in payloads)
+        assert any("search_results" in p for p in payloads)
+        token_text = "".join(p.get("token", "") for p in payloads if "token" in p)
+        assert "found" in token_text.lower()
+        assert any(p.get("done") and p.get("searched") for p in payloads)
+
+
+def test_explicit_search_no_results(tmp_path: Path, monkeypatch):
+    with make_client(tmp_path) as client:
+        headers = _guest_headers(client)
+
+        async def empty_search(query: str, max_results: int = 5):
+            return []
+
+        monkeypatch.setattr(routers.search_route, "query_searxng", empty_search)
+
+        resp = client.post(
+            "/api/search",
+            json={"query": "nothingness", "model": config.DEFAULT_MODEL},
+            headers=headers,
+        )
+        assert resp.status_code == 200
+        payloads = parse_sse_payloads(resp.text)
+
+        assert any("No search results found" in p.get("token", "") for p in payloads)
+        assert any(p.get("done") for p in payloads)
+        assert not any("search_results" in p for p in payloads)
+
+
+def test_explicit_search_new_conversation_created(tmp_path: Path, monkeypatch):
+    with make_client(tmp_path) as client:
+        headers = _guest_headers(client)
+
+        async def search_stub(query: str, max_results: int = 5):
+            return [{"title": "T", "url": "https://ex.com", "content": "Content."}]
+
+        monkeypatch.setattr(routers.search_route, "query_searxng", search_stub)
+
+        events = _stream_json_lines([
+            {"choices": [{"delta": {"content": "Answer."}, "logprobs": None}]},
+            {"choices": [{"delta": {}, "finish_reason": "stop"}], "usage": {}},
+        ])
+
+        def stream_stub(self, method, url, json=None, timeout=None):
+            return _MockStreamResponse(events)
+
+        monkeypatch.setattr(httpx.AsyncClient, "stream", stream_stub)
+
+        resp = client.post(
+            "/api/search",
+            json={"query": "tell me something", "model": config.DEFAULT_MODEL},
+            headers=headers,
+        )
+        assert resp.status_code == 200
+        payloads = parse_sse_payloads(resp.text)
+
+        conv_id = None
+        for p in payloads:
+            if "conversation_id" in p:
+                conv_id = p["conversation_id"]
+                break
+        assert conv_id is not None
+
+        conv_resp = client.get(f"/api/conversations/{conv_id}", headers=_guest_headers(client))
+        assert conv_resp.status_code == 200
+        data = conv_resp.json()
+        assert len(data["messages"]) >= 2
+
+
+def test_explicit_search_stream_error(tmp_path: Path, monkeypatch):
+    with make_client(tmp_path) as client:
+        headers = _guest_headers(client)
+
+        async def search_stub(query: str, max_results: int = 5):
+            return [{"title": "T", "url": "https://ex.com", "content": "Content."}]
+
+        monkeypatch.setattr(routers.search_route, "query_searxng", search_stub)
+
+        def broken_stream(self, method, url, json=None, timeout=None):
+            class BrokenCtx:
+                async def __aenter__(self):
+                    raise RuntimeError("summarization failed")
+                async def __aexit__(self, exc_type, exc, tb):
+                    return False
+            return BrokenCtx()
+
+        monkeypatch.setattr(httpx.AsyncClient, "stream", broken_stream)
+
+        resp = client.post(
+            "/api/search",
+            json={"query": "breaking news", "model": config.DEFAULT_MODEL},
+            headers=headers,
+        )
+        assert resp.status_code == 200
+        assert "error_key" in resp.text
+        assert "INC-" in resp.text
--- a/tests/test_search_url_sanitization.py
+++ b/tests/test_search_url_sanitization.py
@@ -1,17 +1,17 @@
-import app as app_module
+from search import sanitize_outbound_url


 def test_sanitize_outbound_url_allows_http_https():
-    assert app_module.sanitize_outbound_url("https://example.com/path") == "https://example.com/path"
-    assert app_module.sanitize_outbound_url("http://example.com") == "http://example.com"
+    assert sanitize_outbound_url("https://example.com/path") == "https://example.com/path"
+    assert sanitize_outbound_url("http://example.com") == "http://example.com"


 def test_sanitize_outbound_url_blocks_unsafe_schemes():
-    assert app_module.sanitize_outbound_url("javascript:alert(1)") == ""
-    assert app_module.sanitize_outbound_url("data:text/html,evil") == ""
-    assert app_module.sanitize_outbound_url("file:///etc/passwd") == ""
+    assert sanitize_outbound_url("javascript:alert(1)") == ""
+    assert sanitize_outbound_url("data:text/html,evil") == ""
+    assert sanitize_outbound_url("file:///etc/passwd") == ""


 def test_sanitize_outbound_url_blocks_relative_and_empty():
-    assert app_module.sanitize_outbound_url("/relative/path") == ""
-    assert app_module.sanitize_outbound_url("") == ""
+    assert sanitize_outbound_url("/relative/path") == ""
+    assert sanitize_outbound_url("") == ""
--- a/tests/test_settings_allowlist.py
+++ b/tests/test_settings_allowlist.py
@@ -3,17 +3,19 @@ from pathlib import Path

 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import db
+from security import SESSIONS, PIN_ATTEMPTS


 def make_admin_client(tmp_path: Path) -> tuple[TestClient, dict[str, str]]:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-settings.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.init_db()
+    db.DB_PATH = tmp_path / "jarvischat-settings.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    db.init_db()

-    client = TestClient(app_module.app)
+    client = TestClient(app.app)
    login = client.post(
        "/api/auth/login",
        json={"pin": "1234"},
--- a/tests/test_skills_framework.py
+++ b/tests/test_skills_framework.py
@@ -1,19 +1,23 @@
+import asyncio
 import os
 from pathlib import Path

 from fastapi.testclient import TestClient

-import app as app_module
+import app
+import db
+from rag import build_system_prompt
+from security import SESSIONS, PIN_ATTEMPTS, RATE_EVENTS


 def make_client(tmp_path: Path) -> TestClient:
    os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
-    app_module.DB_PATH = tmp_path / "jarvischat-skills.db"
-    app_module.SESSIONS.clear()
-    app_module.PIN_ATTEMPTS.clear()
-    app_module.RATE_EVENTS.clear()
-    app_module.init_db()
-    return TestClient(app_module.app, raise_server_exceptions=False)
+    db.DB_PATH = tmp_path / "jarvischat-skills.db"
+    SESSIONS.clear()
+    PIN_ATTEMPTS.clear()
+    RATE_EVENTS.clear()
+    db.init_db()
+    return TestClient(app.app, raise_server_exceptions=False)


 def test_guest_can_list_skills(tmp_path: Path):
@@ -21,7 +25,7 @@ def test_guest_can_list_skills(tmp_path: Path):
        sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
            "session_id"
        ]
-        resp = client.get("/api/skills", headers={"X-Session-ID": sid})
+        resp = client.get("/api/skills", headers={"X-Session-ID": sid, "Origin": "http://testserver"})
        assert resp.status_code == 200
        payload = resp.json()
        assert payload["count"] >= 1
@@ -46,7 +50,7 @@ def test_admin_can_toggle_skill_enabled_state(tmp_path: Path):
        assert disable.status_code == 200
        assert disable.json()["skill"]["enabled"] is False

-        active = client.get("/api/skills/active", headers={"X-Session-ID": sid})
+        active = client.get("/api/skills/active", headers={"X-Session-ID": sid, "Origin": "http://testserver"})
        assert active.status_code == 200
        assert all(skill["key"] != "search.web" for skill in active.json()["skills"])

@@ -71,23 +75,23 @@ def test_unknown_skill_update_is_rejected(tmp_path: Path):

 def test_prompt_injection_respects_skills_enabled_setting(tmp_path: Path):
    with make_client(tmp_path):
-        db = app_module.get_db()
+        conn = db.get_db()
        try:
-            db.execute(
+            conn.execute(
                "INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)",
                ("skills_enabled", "false"),
            )
-            db.commit()
-            without_skills = app_module.build_system_prompt(db, "", "hello")
+            conn.commit()
+            without_skills = asyncio.run(build_system_prompt(conn, "", "hello"))
            assert "## Active Skills" not in without_skills

-            db.execute(
+            conn.execute(
                "INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)",
                ("skills_enabled", "true"),
            )
-            db.commit()
-            with_skills = app_module.build_system_prompt(db, "", "hello")
+            conn.commit()
+            with_skills = asyncio.run(build_system_prompt(conn, "", "hello"))
            assert "## Active Skills" in with_skills
            assert "memory.search" in with_skills
        finally:
-            db.close()
+            conn.close()
Author	SHA1	Message	Date
gramps	b8405b8d76	fix: increase trash icon visibility (remove 0.5 opacity, bump to 15px)	2026-06-27 16:09:05 -07:00
gramps	e3b1780292	feat: add trash-can icon to left of each conversation in sidebar Replace the hover-reveal × on the right with an always-visible 🗑 icon positioned to the left of the conversation title. Clicking it triggers the existing deleteConversation() which shows a confirm dialog and enforces admin-only access.	2026-06-27 16:07:18 -07:00
gramps	66b086c3f3	fix: restore EMBED_URL pointing to ollama on 192.168.50.210:11434	2026-06-27 16:03:19 -07:00
gramps	4b36fd315a	fix: replace hardcoded EMBED_URL with LLAMA_SERVER_BASE from config EMBED_URL in rag.py hardcoded the IP and port instead of using LLAMA_SERVER_BASE, so the env var JARVISCHAT_LLAMA_SERVER_BASE was ignored for embedding requests.	2026-06-27 15:59:43 -07:00
gramps	fcc0605a4a	release: bump version to v1.8.5	2026-06-27 15:27:47 -07:00
gramps	091e2ad2e3	test: add unit tests for all 10 routers (92 total) New test files: - test_conversations.py — list/create/get/update/delete/delete-all, admin enforcement - test_presets.py — list/create/update/delete, default preset protection - test_profile.py — get/update/default, length validation - test_models_router.py — list/ps/show/stats/search-status, connect errors - test_completions.py — API key auth, FIM passthrough, streaming/blocking, errors - test_search_route.py — explicit search flow, no results, stream errors - test_memories.py — edit/search/stats endpoints, validation, admin enforcement Update AGENTS.md with full test file coverage table and README.md	2026-06-27 15:27:13 -07:00
gramps	5986c4ad86	fix: close two CSRF origin-check security gaps - Extend origin check to all /api/ requests (not just state-changing methods), closing the GET/HEAD/OPTIONS bypass that allowed cross-origin reads - origin_allowed() now returns False when both Origin and Referer headers are absent, preventing script-initiated requests from bypassing the check - Update AGENTS.md and README.md to document the changes	2026-06-27 15:20:02 -07:00
gramps	cc1efa7a21	fix: resolve all critical runtime errors and bugs from audit - Add COMPLETIONS_API_KEY to config.py (env var + auto-generated fallback) - Fix perplexity auto-search: upstream sends logprobs=true, parse_llama_stream_chunk extracts per-token logprobs, all_logprobs populated during streaming - Fix all /api/models endpoints to target LLAMA_SERVER_BASE (port 8081) not OLLAMA_BASE - Fix RAG embedding endpoint URL from port 11434 (Ollama) to 8081 (llama-server) - Correct misleading error messages: 'inference server' not 'Ollama' - Remove raw_results leak from SSE event stream in /api/search - Fix weather query extractor: pattern-match instead of unconditional suffix append - Escape FTS5 operator keywords (AND/OR/NOT/NEAR) in memory search - Move auth.py BODY_LIMIT_DEFAULT_BYTES imports to module level - Change RAG injection log level from warning to info - Fix all 8 test files after modular refactor (rewire imports from correct modules) - Update AGENTS.md and README.md to reflect v1.8.0 changes	2026-06-27 15:12:18 -07:00
Llama Chile Shop	41a8708c0d	docs: add roadmap items M (MCP) and N (AMQP cluster nervous system), fix jarvis IP	2026-06-23 15:31:16 +00:00