Compare commits
39 Commits
f1c92be390
...
v1.9.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ec2f4c0332 | ||
| f691787037 | |||
| 56919965e1 | |||
| f1fbc24c94 | |||
| 8d3cf5d478 | |||
| d01dd3b761 | |||
| 5075a6bc55 | |||
| 970abc8957 | |||
| dd475a6f2d | |||
| 6de3a1e154 | |||
| 5a652c1b74 | |||
| 18bca027de | |||
| 36bca94840 | |||
| 71b48d940f | |||
| 58945a4324 | |||
| 4d1541412b | |||
| 250fec1f06 | |||
| 12188f3ad2 | |||
| 9589141521 | |||
| c88e52e0ef | |||
| 76e4461b38 | |||
| 28aa40c42a | |||
| d9eba53926 | |||
| 091a851064 | |||
| 81319f83d4 | |||
| fc11b73319 | |||
| 46f1d6bf4e | |||
| 6f410e29d2 | |||
| 7a151b7d50 | |||
| 6988997144 | |||
| c798f1220c | |||
| dc55d0a8c9 | |||
| 3d1ede26ca | |||
| d57f009b10 | |||
| 1c91c336a9 | |||
| 757f26669a | |||
| 7fccb926db | |||
| 47850efd2a | |||
| 4c7610a554 |
2
.gitignore
vendored
2
.gitignore
vendored
@@ -3,3 +3,5 @@
|
|||||||
*.py-
|
*.py-
|
||||||
__pycache__/
|
__pycache__/
|
||||||
venv/
|
venv/
|
||||||
|
readme.md-
|
||||||
|
*.bak
|
||||||
|
|||||||
74
CLAUDE.md
Normal file
74
CLAUDE.md
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Running the App
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Development
|
||||||
|
./venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080 --reload
|
||||||
|
|
||||||
|
# Production (via systemd)
|
||||||
|
sudo systemctl restart jarvischat
|
||||||
|
|
||||||
|
# Direct run
|
||||||
|
./venv/bin/python app.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./venv/bin/pip install -r requirements.txt
|
||||||
|
# Also requires: psutil jinja2 python-multipart (not in requirements.txt)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Single-file FastAPI backend (`app.py`) + single-template frontend (`templates/index.html`). No build step. SQLite database auto-created at `jarvischat.db` on first run.
|
||||||
|
|
||||||
|
### Request Flow: `/api/chat`
|
||||||
|
|
||||||
|
1. User message saved to DB → conversation created if new
|
||||||
|
2. `build_system_prompt()` assembles: profile + FTS5 memory search results + preset prompt
|
||||||
|
3. Streamed to Ollama (`/api/chat`, `stream: true`, `logprobs: true`) via SSE
|
||||||
|
4. **Auto web search trigger**: if perplexity > 15.0 OR response matches `REFUSAL_PATTERNS`, re-queries Ollama with SearXNG results prepended to system prompt
|
||||||
|
5. Final response saved to DB; SSE `done` event sent with perplexity + tokens/sec
|
||||||
|
|
||||||
|
### Request Flow: `/api/search` (explicit search)
|
||||||
|
|
||||||
|
Bypasses perplexity/refusal detection entirely — queries SearXNG directly then asks Ollama to summarize with results as system context.
|
||||||
|
|
||||||
|
### Memory System
|
||||||
|
|
||||||
|
FTS5 virtual table (`memories`) in SQLite. `search_memories()` uses BM25 ranking. `process_remember_command()` intercepts "remember that..." / "forget about..." before the message reaches Ollama and returns a confirmation string. Topic auto-detection via keyword matching in `detect_topic()`.
|
||||||
|
|
||||||
|
### Key Constants (top of `app.py`)
|
||||||
|
|
||||||
|
- `OLLAMA_BASE` — `http://localhost:11434`
|
||||||
|
- `SEARXNG_BASE` — `http://localhost:8888`
|
||||||
|
- `PERPLEXITY_THRESHOLD` — `15.0` (controls auto-search sensitivity)
|
||||||
|
- `DEFAULT_MODEL` — `llama3.1:latest`
|
||||||
|
|
||||||
|
### External Services
|
||||||
|
|
||||||
|
- **Ollama** — required, must be running on port 11434
|
||||||
|
- **SearXNG** — optional, port 8888; `GET /api/search/status` probes availability
|
||||||
|
- **wttr.in** — weather shortcut in `query_searxng()`, bypasses SearXNG for weather queries
|
||||||
|
- **rocm-smi** — AMD GPU stats via subprocess; gracefully degrades if not available
|
||||||
|
|
||||||
|
### Database
|
||||||
|
|
||||||
|
`get_db()` opens a new connection per request (no connection pool). `init_db()` runs at startup via the FastAPI `lifespan` handler. The `profile` table uses a singleton row (`id = 1`). Default settings are seeded but never overwritten by `init_db()`.
|
||||||
|
|
||||||
|
### SSE Protocol
|
||||||
|
|
||||||
|
All streaming endpoints yield `data: {json}\n\n`. Key event shapes:
|
||||||
|
- `{token, conversation_id}` — streaming token
|
||||||
|
- `{searching: true}` — web search triggered
|
||||||
|
- `{search_results: N}` — N results retrieved
|
||||||
|
- `{done: true, perplexity, tokens_per_sec, searched?}` — terminal event
|
||||||
|
- `{error: "..."}` — error event
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Runs as systemd service under user `jarvischat`, working directory `/opt/jarvischat`. Logs via syslog (`journalctl -u jarvischat`).
|
||||||
453
README.md
Normal file
453
README.md
Normal file
@@ -0,0 +1,453 @@
|
|||||||
|

|
||||||
|
# ⚡ JarvisChat v1.9.0
|
||||||
|
|
||||||
|
**A privacy-first, homelab-native developer knowledge platform.**
|
||||||
|
|
||||||
|
> JarvisChat turns a heterogeneous LAN of budget hardware into a distributed local AI inference cluster — accumulating institutional knowledge over time, keeping all data off the cloud, and squeezing real performance out of modest consumer hardware through architecture rather than dollars.
|
||||||
|
|
||||||
|
This is not another AI chat wrapper. jC is the UX and knowledge-management layer for a local AI brain — analogous to what Windows was to DOS, or what the web is to the internet. The intelligence lives in the model and the RAG corpus. jC makes it accessible and keeps feeding it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Four Pillars
|
||||||
|
|
||||||
|
### 1. Privacy
|
||||||
|
Everything runs on your LAN. No API keys, no cloud endpoints, no data leaving your network, no subscription, no terms-of-service surprises. Your conversations, your codebase, your decisions — stay yours.
|
||||||
|
|
||||||
|
### 2. Knowledge Retention
|
||||||
|
Unlike stateless chat tools that forget everything when you close the tab, jC accumulates institutional memory. Every solved problem, every architectural decision, every working command gets absorbed into the RAG corpus via Qdrant. The system gets smarter the longer you use it.
|
||||||
|
|
||||||
|
### 3. Budget Hardware Maximization
|
||||||
|
You don't need a $10,000 workstation. jC is designed for the developer who has a drawer full of machines and the skills to wire them together. RPC clustering, model splitting across CPU and GPU nodes, dynamic resource negotiation, and smart RAG eviction squeeze real performance out of modest consumer hardware.
|
||||||
|
|
||||||
|
### 4. Homelab-Native Architecture
|
||||||
|
Built specifically for the heterogeneous homelab: mixed hardware, mixed OS, consumer GPUs, ARM boards, NAS storage — all working together as a coherent AI platform. A designated master node hosts jC, llama-server, and SearXNG. GPU nodes self-register as RPC inference workers. The architecture scales horizontally across whatever you've got.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Target Audience
|
||||||
|
|
||||||
|
Solo developers and homelab enthusiasts who are:
|
||||||
|
- Budget-constrained but hardware-rich (multiple machines, NAS, spare GPUs)
|
||||||
|
- Privacy-conscious (no cloud AI subscriptions)
|
||||||
|
- Technically capable (if you can install jC, you can designate the master node)
|
||||||
|
- Building something over time and want their AI to remember it
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ YOUR LAN │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────┐ ┌──────────────────────────┐ │
|
||||||
|
│ │ jarvis │◄──RPC───│ ultron │ │
|
||||||
|
│ │ 192.168.50.212│ 50052 │ 192.168.50.108 │ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ │ jC :8080 │ │ llama-server :8081 │ │
|
||||||
|
│ │ SearXNG :8888 │ │ llama-server :8082 (*) │ │
|
||||||
|
│ │ RX 6600 XT 8GB │ │ Qdrant :6333 │ │
|
||||||
|
│ │ GPU RPC worker │ │ mxbai-embed :11434 │ │
|
||||||
|
│ │ Vulkan backend │ │ AMD Ryzen 7 7840HS │ │
|
||||||
|
│ └─────────────────┘ │ Radeon 780M iGPU │ │
|
||||||
|
│ └──────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────┐ ┌──────────────────────────┐ │
|
||||||
|
│ │ pivault │ │ corsair │ │
|
||||||
|
│ │ 192.168.50.158│ │ 192.168.50.132 │ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ │ 10.83TB RAID5 │ │ RTX 5070 Ti 16GB │ │
|
||||||
|
│ │ RPi 5 8GB │ │ Ryzen 7 7800X3D │ │
|
||||||
|
│ │ NAS / Kopia │ │ Gaming / Streaming │ │
|
||||||
|
│ └─────────────────┘ └──────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ (*) Planned: Qwen2.5-Coder-14B on :8082 │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Data flow:**
|
||||||
|
```
|
||||||
|
Browser / IDE (Continue.dev)
|
||||||
|
→ jC :8080 (FastAPI — auth, RAG, memory, conversation history)
|
||||||
|
→ Qdrant :6333 (vector search, mxbai-embed-large for embeddings)
|
||||||
|
→ llama-server :8081 (inference)
|
||||||
|
→ jarvis RPC :50052 (GPU layer offload — RX 6600 XT)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The AMD + NVIDIA Cross-Cluster Reality
|
||||||
|
|
||||||
|
This cluster intentionally mixes GPU architectures — **AMD RX 6600 XT on jarvis** and **NVIDIA RTX 5070 Ti on corsair**. This is deliberate and it works.
|
||||||
|
|
||||||
|
The RPC layer in llama.cpp is GPU-vendor-agnostic. jarvis runs llama-rpc with a **Vulkan backend** (not ROCm, not CUDA) which provides hardware-neutral GPU acceleration. ultron's llama-server connects to it over TCP and offloads tensor layers without caring what GPU is on the other end.
|
||||||
|
|
||||||
|
This means any machine on your LAN with any GPU (AMD, NVIDIA, Intel Arc) can participate as an RPC worker — as long as it can run llama-rpc with Vulkan support.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cluster Performance Tuning
|
||||||
|
|
||||||
|
### The Layer Offloading Trick
|
||||||
|
|
||||||
|
The key to squeezing performance out of a CPU+GPU split cluster is `--n-gpu-layers`. This controls how many transformer layers get offloaded to the RPC GPU backend versus staying on the CPU.
|
||||||
|
|
||||||
|
**Starting point (before tuning):** ~7 t/s
|
||||||
|
**After initial layer optimization:** ~17 t/s
|
||||||
|
**After full cluster tuning:** 30–35 t/s
|
||||||
|
|
||||||
|
The progression that got us there:
|
||||||
|
|
||||||
|
1. **Start with `--n-gpu-layers 99`** — tells llama-server to offload as many layers as possible. With Mistral-Nemo-12B Q4_K_M this results in all 41/41 layers offloading to jarvis GPU via RPC.
|
||||||
|
|
||||||
|
2. **Verify GPU is actually working** — watch the llama-server startup log for:
|
||||||
|
```
|
||||||
|
load_tensors: offloaded 41/41 layers to GPU
|
||||||
|
load_tensors: RPC[192.168.50.210:50052] model buffer size = 6763.30 MiB
|
||||||
|
load_tensors: CPU_Mapped model buffer size = 360.00 MiB
|
||||||
|
```
|
||||||
|
If layers aren't offloading, the RPC connection isn't established.
|
||||||
|
|
||||||
|
3. **Check actual throughput** — the timings block in llama-server responses shows real t/s. Tune from there.
|
||||||
|
|
||||||
|
**Current llama-server service on ultron (`/etc/systemd/system/llama-server.service`):**
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Llama.cpp Server (RPC frontend — Mistral-Nemo general)
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=root
|
||||||
|
ExecStart=/root/llama.cpp/build/bin/llama-server \
|
||||||
|
--model /home/gramps/models/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf \
|
||||||
|
--rpc 192.168.50.212:50052 \
|
||||||
|
--host 0.0.0.0 \
|
||||||
|
--port 8081 \
|
||||||
|
--n-gpu-layers 99
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
**llama-rpc service on jarvis (`/etc/systemd/system/llama-rpc.service`):**
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Llama.cpp RPC Server (GPU backend — RX 6600 XT Vulkan)
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=root
|
||||||
|
ExecStart=/root/llama.cpp/build/bin/llama-rpc-server \
|
||||||
|
--host 0.0.0.0 \
|
||||||
|
--port 50052
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Models
|
||||||
|
|
||||||
|
### Current
|
||||||
|
| Model | Location | Port | Purpose |
|
||||||
|
|-------|----------|------|---------|
|
||||||
|
| Mistral-Nemo-Instruct-2407-Q4_K_M | `/home/gramps/models/` on jarvis | ultron:8081 | General assistant, chat |
|
||||||
|
| mxbai-embed-large | ultron (Docker/Ollama) | ultron:11434 | RAG embeddings |
|
||||||
|
|
||||||
|
### Planned
|
||||||
|
| Model | Size | Port | Purpose |
|
||||||
|
|-------|------|------|---------|
|
||||||
|
| Qwen2.5-Coder-14B-Q5_K_M | ~10GB | ultron:8082 | Code completion, pair programming |
|
||||||
|
|
||||||
|
> **Note:** ultron has 16GB RAM. Only one primary inference model can be hot at a time. llama-server instances are swapped via systemd when switching between general and code models.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RAG System
|
||||||
|
|
||||||
|
jC uses **Qdrant** for vector storage and **mxbai-embed-large** (1024-dim) for embeddings.
|
||||||
|
|
||||||
|
### Qdrant Collection
|
||||||
|
- **Collection:** `jarvis_rag`
|
||||||
|
- **Vector size:** 1024 (mxbai-embed-large output)
|
||||||
|
- **Distance:** Cosine
|
||||||
|
- **Score threshold:** 0.25 (filters low-relevance chunks)
|
||||||
|
- **Chunks retrieved per query:** 3 (configurable)
|
||||||
|
|
||||||
|
### RAM Ceiling
|
||||||
|
Each vector = 4KB (1024 dims × float32). With ultron's ~4-6GB available to Qdrant after llama-server:
|
||||||
|
- Practical ceiling: ~1–1.5M chunks before RAM becomes the bottleneck
|
||||||
|
- Current corpus: 219 points (early stage)
|
||||||
|
- Storage on disk: negligible against pivault's 10.83TB
|
||||||
|
|
||||||
|
### What Gets Ingested
|
||||||
|
- Code repositories (your actual codebase)
|
||||||
|
- Pair-programming conversation history
|
||||||
|
- Architecture decisions and working commands
|
||||||
|
- Documentation and URLs (fetched and stripped via beautifulsoup4/httpx)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## JarvisChat Service (`/etc/systemd/system/jarvischat.service`)
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=JarvisChat - Local LLM Developer Platform
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=root
|
||||||
|
WorkingDirectory=/opt/jarvischat
|
||||||
|
ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
|
||||||
|
Restart=always
|
||||||
|
RestartSec=5
|
||||||
|
Environment=PYTHONUNBUFFERED=1
|
||||||
|
Environment=OLLAMA_BASE=http://192.168.50.108:8081
|
||||||
|
Environment=LLAMA_SERVER_BASE=http://192.168.50.108:8081
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Python 3.11+ (tested on 3.13)
|
||||||
|
- llama.cpp built from source on both jarvis (RPC server) and ultron (llama-server)
|
||||||
|
- Qdrant running on ultron
|
||||||
|
- Ollama on ultron (for mxbai-embed-large embeddings)
|
||||||
|
- SearXNG on jarvis:8888 (optional, for web search)
|
||||||
|
|
||||||
|
### Fresh Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /opt/jarvischat
|
||||||
|
sudo chown $USER:$USER /opt/jarvischat
|
||||||
|
cd /opt/jarvischat
|
||||||
|
python3 -m venv venv
|
||||||
|
./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart qdrant-client
|
||||||
|
mkdir -p templates static
|
||||||
|
```
|
||||||
|
|
||||||
|
Copy `app.py` to `/opt/jarvischat/` and `index.html` to `/opt/jarvischat/templates/`.
|
||||||
|
|
||||||
|
### Bootstrap the PIN
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export JARVISCHAT_ADMIN_PIN=XXXX # your 4-digit PIN
|
||||||
|
```
|
||||||
|
|
||||||
|
Or allow the insecure default for testing:
|
||||||
|
```bash
|
||||||
|
export JARVISCHAT_ALLOW_DEFAULT_PIN=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| `OLLAMA_BASE` | `http://localhost:11434` | Ollama-compatible endpoint (legacy) |
|
||||||
|
| `LLAMA_SERVER_BASE` | `http://192.168.50.108:8081` | llama-server OpenAI-compat inference endpoint |
|
||||||
|
| `JARVISCHAT_ADMIN_PIN` | (none) | 4-digit admin PIN (required on first boot) |
|
||||||
|
| `JARVISCHAT_ALLOW_DEFAULT_PIN` | `false` | Allow insecure default PIN 1234 |
|
||||||
|
| `JARVISCHAT_TRUSTED_ORIGINS` | (none) | Comma-separated trusted origins for CSRF |
|
||||||
|
| `JARVISCHAT_ALLOWED_CIDRS` | RFC1918 + loopback | Allowed client IP CIDRs |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
### Auth
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| POST | `/api/auth/guest` | Create guest session |
|
||||||
|
| POST | `/api/auth/login` | Admin PIN login |
|
||||||
|
| POST | `/api/auth/logout` | Revoke session |
|
||||||
|
| GET | `/api/auth/session` | Check session status |
|
||||||
|
| POST | `/api/auth/heartbeat` | Keep session alive |
|
||||||
|
|
||||||
|
### Chat & Search
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| POST | `/api/chat` | Streaming chat (SSE) |
|
||||||
|
| POST | `/api/search` | Explicit web search via SearXNG |
|
||||||
|
| GET | `/api/search/status` | SearXNG health check |
|
||||||
|
|
||||||
|
### Models
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| GET | `/api/models` | List available models from llama-server |
|
||||||
|
| GET | `/api/ps` | Running models |
|
||||||
|
| POST | `/api/show` | Model info |
|
||||||
|
|
||||||
|
### Memory
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| GET | `/api/memories` | List all memories |
|
||||||
|
| POST | `/api/memories` | Add memory |
|
||||||
|
| PUT | `/api/memories/{rowid}` | Update memory |
|
||||||
|
| DELETE | `/api/memories/{rowid}` | Delete memory |
|
||||||
|
| GET | `/api/memories/search?q=` | FTS5 search memories |
|
||||||
|
| GET | `/api/memories/stats` | Memory statistics |
|
||||||
|
|
||||||
|
### Conversations
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| GET | `/api/conversations` | List conversations |
|
||||||
|
| POST | `/api/conversations` | Create conversation |
|
||||||
|
| GET | `/api/conversations/{id}` | Get conversation + messages |
|
||||||
|
| PUT | `/api/conversations/{id}` | Update title/model |
|
||||||
|
| DELETE | `/api/conversations/{id}` | Delete conversation |
|
||||||
|
| DELETE | `/api/conversations` | Delete all conversations |
|
||||||
|
|
||||||
|
### Profile & Settings
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| GET | `/api/profile` | Get profile |
|
||||||
|
| PUT | `/api/profile` | Update profile |
|
||||||
|
| GET | `/api/settings` | Get settings |
|
||||||
|
| PUT | `/api/settings` | Update settings |
|
||||||
|
| GET | `/api/stats` | CPU/RAM/GPU stats |
|
||||||
|
|
||||||
|
### Skills
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| GET | `/api/skills` | List all skills |
|
||||||
|
| GET | `/api/skills/active` | List enabled skills |
|
||||||
|
| PUT | `/api/skills/{key}` | Enable/disable skill |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Memory Commands
|
||||||
|
|
||||||
|
Say these in chat to interact with the memory system:
|
||||||
|
|
||||||
|
| Command | Effect |
|
||||||
|
|---------|--------|
|
||||||
|
| `remember that [fact]` | Stores fact in FTS5 memory |
|
||||||
|
| `please remember [fact]` | Same |
|
||||||
|
| `don't forget [fact]` | Same |
|
||||||
|
| `forget about [topic]` | Deletes matching memories |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### jC starts but inference is slow or failing
|
||||||
|
Check that llama-rpc is running on jarvis and llama-server is connected:
|
||||||
|
```bash
|
||||||
|
# On jarvis
|
||||||
|
systemctl status llama-rpc
|
||||||
|
|
||||||
|
# On ultron — look for "offloaded N/N layers to GPU" in logs
|
||||||
|
journalctl -u llama-server -n 50 --no-pager
|
||||||
|
```
|
||||||
|
|
||||||
|
### ultron shows no CPU activity during inference
|
||||||
|
Inference is being handled entirely by jarvis GPU via RPC — this is correct and expected. ultron's CPU is only involved for non-offloaded tensors (a small fraction of the model).
|
||||||
|
|
||||||
|
### RAG not returning results
|
||||||
|
Check Qdrant is up and the collection exists:
|
||||||
|
```bash
|
||||||
|
curl http://192.168.50.108:6333/collections/jarvis_rag
|
||||||
|
```
|
||||||
|
Verify `points_count` > 0. If zero, the corpus hasn't been seeded yet.
|
||||||
|
|
||||||
|
### jC won't start — PIN bootstrap error
|
||||||
|
Set the PIN via environment before first boot:
|
||||||
|
```bash
|
||||||
|
export JARVISCHAT_ADMIN_PIN=XXXX
|
||||||
|
systemctl restart jarvischat
|
||||||
|
```
|
||||||
|
|
||||||
|
### sqlite3 not found
|
||||||
|
Use Python instead:
|
||||||
|
```bash
|
||||||
|
python3 -c "import sqlite3; print(sqlite3.connect('/opt/jarvischat/jarvischat.db').execute('SELECT * FROM settings').fetchall())"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Roadmap
|
||||||
|
|
||||||
|
### TODO (Priority Order)
|
||||||
|
1. **Tool calling** — read_file/write_file with /opt/jarvischat whitelist, tool_calls dispatch loop
|
||||||
|
2. **git_tool** — Gitea integration for commit/push from jC
|
||||||
|
3. **Audit logging** — structured audit trail to syslog
|
||||||
|
4. SearXNG persistence (DONE ✅)
|
||||||
|
5. search+ prefix for explicit search
|
||||||
|
6. profile.example.md
|
||||||
|
7. Conversation search/filter
|
||||||
|
8. Export to markdown
|
||||||
|
9. Keyboard shortcuts
|
||||||
|
10. Retry button
|
||||||
|
11. Source links in responses
|
||||||
|
12. Rename conversations
|
||||||
|
13. Multiple profiles
|
||||||
|
14. KWIC auto-tags
|
||||||
|
15. Image input (vision)
|
||||||
|
16. btop split-screen integration
|
||||||
|
17. Containerize
|
||||||
|
18. SearXNG health indicator in UI
|
||||||
|
19. check_patch_notes tool
|
||||||
|
20. GitLab mirror of llgit repo
|
||||||
|
|
||||||
|
### ROADMAP (Longer Horizon)
|
||||||
|
|
||||||
|
**(A) Modular refactor** — Split monolithic app.py into routers/, services/, config.py, db.py, auth.py. Prerequisite for everything below.
|
||||||
|
|
||||||
|
**(B) RAG ingest/manage UI** — File upload, URL ingest (fetch + strip HTML via beautifulsoup4/httpx, store URL as source metadata for citation), delete chunks/collections.
|
||||||
|
|
||||||
|
**(C) Backend config panel** — Switch between Ollama/llama-server, endpoint URLs, model switching, restart — all from the UI without touching config files.
|
||||||
|
|
||||||
|
**(D) Response metrics display** — tokens/sec, TTFT, context size, RAG chunks retrieved + scores — visible in the UI per response.
|
||||||
|
|
||||||
|
**(E) Response quality feedback** — thumbs/stars/tags per response → feedback corpus → future RLHF dataset.
|
||||||
|
|
||||||
|
**(F) IDE integration** — Continue.dev + VS Code, pointed at jC:8080 (not direct to inference endpoint). All IDE traffic — including pair-programming conversations — goes through jC so sessions are persisted and become RAG-worthy content. jC needs FIM request format handling to support inline autocomplete.
|
||||||
|
|
||||||
|
**(G) Conversation history export → RAG ingest** — Bulk ingest existing conversation history into Qdrant.
|
||||||
|
|
||||||
|
**(H) Fine-tuning pipeline** — LoRA on Mistral-Nemo from feedback corpus (item E).
|
||||||
|
|
||||||
|
**(I) Autonomous RAG** — At conversation end, jC self-evaluates the transcript, extracts significant chunks (solved problems, working commands, architectural decisions), and ingests them into Qdrant automatically with metadata (date, conversation_id, reason). jC decides what it needs to remember. Closes the loop.
|
||||||
|
|
||||||
|
**(J) Startup hardware/resource self-assessment** — On boot, jC queries ultron for available RAM, Qdrant consumption, and llama-server footprint. Derives dynamic high-water marks for RAG chunk limits, context window sizing, retrieval limits, and eviction thresholds. Writes a living config file. Replaces magic numbers with runtime-negotiated values.
|
||||||
|
|
||||||
|
**(K) RAG corpus management** — Weighted LRU eviction with composite score (recency + frequency + content age) + manual pin flag for load-bearing knowledge. Prevents corpus bloat from degrading retrieval quality. Analogous to memcache eviction policy.
|
||||||
|
|
||||||
|
**(L) Dual inference model architecture** — Mistral-Nemo-12B on ultron:8081 (general assistant), Qwen2.5-Coder-14B-Q5_K_M on ultron:8082 (code/pair programming). jC selects endpoint based on active model. Only one model hot at a time given ultron's 16GB RAM constraint.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Primary Cluster Objectives
|
||||||
|
|
||||||
|
1. **Generative AI inference** — Local, private, fast enough to be useful
|
||||||
|
2. **Agentic functionality** — Autonomous RAG self-management is the canonical first example. The system acts, not just responds.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository
|
||||||
|
|
||||||
|
```
|
||||||
|
ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git
|
||||||
|
```
|
||||||
|
|
||||||
|
> SSH username is `gitea`, not `git`. Port 1319.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
2334
app.py.bak
Normal file
2334
app.py.bak
Normal file
File diff suppressed because it is too large
Load Diff
2334
app.py.pre-refactor-20260616-081744
Normal file
2334
app.py.pre-refactor-20260616-081744
Normal file
File diff suppressed because it is too large
Load Diff
204
auth.py
Normal file
204
auth.py
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - Auth: session management, PIN verification, middleware, auth routes.
|
||||||
|
"""
|
||||||
|
import hashlib
|
||||||
|
import hmac
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from fastapi.responses import JSONResponse
|
||||||
|
|
||||||
|
from config import SESSION_TIMEOUT_SECONDS, MAX_PIN_ATTEMPTS, PIN_LOCKOUT_SECONDS, RATE_WINDOW_SECONDS
|
||||||
|
from db import get_db, get_setting
|
||||||
|
from security import (
|
||||||
|
SESSIONS, PIN_ATTEMPTS, SESSION_LOCK, audit_event, get_client_ip,
|
||||||
|
is_ip_allowed, check_rate_limit, rate_policy, origin_allowed,
|
||||||
|
is_state_changing, request_body_limit, read_json_body, hash_pin,
|
||||||
|
customer_error_envelope, log_incident,
|
||||||
|
)
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
def verify_admin_pin(pin: str) -> bool:
|
||||||
|
if not re.fullmatch(r"\d{4}", pin or ""):
|
||||||
|
return False
|
||||||
|
db = get_db()
|
||||||
|
pin_hash = get_setting(db, "admin_pin_hash", "")
|
||||||
|
pin_salt = get_setting(db, "admin_pin_salt", "")
|
||||||
|
db.close()
|
||||||
|
if not pin_hash or not pin_salt:
|
||||||
|
return False
|
||||||
|
_, candidate_hash = hash_pin(pin, salt_hex=pin_salt)
|
||||||
|
return hmac.compare_digest(candidate_hash, pin_hash)
|
||||||
|
|
||||||
|
|
||||||
|
def is_ip_locked(ip: str) -> tuple:
|
||||||
|
now_ts = time.time()
|
||||||
|
with SESSION_LOCK:
|
||||||
|
state = PIN_ATTEMPTS.get(ip)
|
||||||
|
if not state:
|
||||||
|
return False, 0
|
||||||
|
locked_until = state.get("locked_until", 0)
|
||||||
|
if locked_until > now_ts:
|
||||||
|
return True, int(locked_until - now_ts)
|
||||||
|
if locked_until:
|
||||||
|
PIN_ATTEMPTS.pop(ip, None)
|
||||||
|
return False, 0
|
||||||
|
|
||||||
|
|
||||||
|
def record_pin_failure(ip: str) -> None:
|
||||||
|
now_ts = time.time()
|
||||||
|
with SESSION_LOCK:
|
||||||
|
state = PIN_ATTEMPTS.get(ip, {"fail_count": 0, "locked_until": 0})
|
||||||
|
state["fail_count"] = int(state.get("fail_count", 0)) + 1
|
||||||
|
if state["fail_count"] >= MAX_PIN_ATTEMPTS:
|
||||||
|
state["locked_until"] = now_ts + PIN_LOCKOUT_SECONDS
|
||||||
|
state["fail_count"] = 0
|
||||||
|
PIN_ATTEMPTS[ip] = state
|
||||||
|
|
||||||
|
|
||||||
|
def clear_pin_failures(ip: str) -> None:
|
||||||
|
with SESSION_LOCK:
|
||||||
|
PIN_ATTEMPTS.pop(ip, None)
|
||||||
|
|
||||||
|
|
||||||
|
def cleanup_sessions(now_ts: Optional[float] = None) -> None:
|
||||||
|
now_ts = now_ts or time.time()
|
||||||
|
with SESSION_LOCK:
|
||||||
|
expired = [
|
||||||
|
sid for sid, meta in SESSIONS.items()
|
||||||
|
if (now_ts - meta.get("last_seen", 0)) > SESSION_TIMEOUT_SECONDS
|
||||||
|
]
|
||||||
|
for sid in expired:
|
||||||
|
del SESSIONS[sid]
|
||||||
|
|
||||||
|
|
||||||
|
def create_session(ip: str, role: str) -> str:
|
||||||
|
now_ts = time.time()
|
||||||
|
sid = uuid.uuid4().hex
|
||||||
|
with SESSION_LOCK:
|
||||||
|
SESSIONS[sid] = {"ip": ip, "role": role, "created_at": now_ts, "last_seen": now_ts}
|
||||||
|
return sid
|
||||||
|
|
||||||
|
|
||||||
|
def validate_session(sid: str, ip: str, touch: bool = True) -> bool:
|
||||||
|
if not sid:
|
||||||
|
return False
|
||||||
|
now_ts = time.time()
|
||||||
|
cleanup_sessions(now_ts)
|
||||||
|
with SESSION_LOCK:
|
||||||
|
session = SESSIONS.get(sid)
|
||||||
|
if not session or session.get("ip") != ip:
|
||||||
|
return False
|
||||||
|
if touch:
|
||||||
|
session["last_seen"] = now_ts
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def get_session(sid: str, ip: str, touch: bool = True) -> Optional[dict]:
|
||||||
|
if not sid:
|
||||||
|
return None
|
||||||
|
now_ts = time.time()
|
||||||
|
cleanup_sessions(now_ts)
|
||||||
|
with SESSION_LOCK:
|
||||||
|
session = SESSIONS.get(sid)
|
||||||
|
if not session or session.get("ip") != ip:
|
||||||
|
return None
|
||||||
|
if touch:
|
||||||
|
session["last_seen"] = now_ts
|
||||||
|
return dict(session)
|
||||||
|
|
||||||
|
|
||||||
|
def revoke_session(sid: str) -> None:
|
||||||
|
if not sid:
|
||||||
|
return
|
||||||
|
with SESSION_LOCK:
|
||||||
|
SESSIONS.pop(sid, None)
|
||||||
|
|
||||||
|
|
||||||
|
def is_admin_only(path: str, method: str) -> bool:
|
||||||
|
if method in {"PUT", "DELETE", "PATCH"}:
|
||||||
|
return True
|
||||||
|
if method != "POST":
|
||||||
|
return False
|
||||||
|
guest_allowed_posts = {
|
||||||
|
"/api/chat", "/api/search", "/api/show", "/api/auth/login",
|
||||||
|
"/api/auth/logout", "/api/auth/session", "/api/auth/heartbeat", "/api/auth/guest",
|
||||||
|
}
|
||||||
|
return path not in guest_allowed_posts
|
||||||
|
|
||||||
|
|
||||||
|
# --- Auth routes ---
|
||||||
|
|
||||||
|
@router.post("/api/auth/guest")
|
||||||
|
async def auth_guest(request: Request):
|
||||||
|
ip = get_client_ip(request)
|
||||||
|
sid = create_session(ip, role="guest")
|
||||||
|
audit_event("guest_session", "success", ip=ip, role="guest")
|
||||||
|
return {"status": "ok", "session_id": sid, "role": "guest", "timeout_seconds": SESSION_TIMEOUT_SECONDS}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/auth/login")
|
||||||
|
async def auth_login(request: Request):
|
||||||
|
from security import BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
pin = str(body.get("pin", ""))
|
||||||
|
ip = get_client_ip(request)
|
||||||
|
locked, retry_after = is_ip_locked(ip)
|
||||||
|
if locked:
|
||||||
|
audit_event("admin_login", "locked", ip=ip, role="none", details=f"retry_after={retry_after}", warning=True)
|
||||||
|
raise HTTPException(status_code=429, detail=f"Too many failed PIN attempts. Retry in {retry_after}s.")
|
||||||
|
if not verify_admin_pin(pin):
|
||||||
|
record_pin_failure(ip)
|
||||||
|
audit_event("admin_login", "failed", ip=ip, role="none", warning=True)
|
||||||
|
raise HTTPException(status_code=401, detail="Invalid PIN")
|
||||||
|
clear_pin_failures(ip)
|
||||||
|
sid = create_session(ip, role="admin")
|
||||||
|
audit_event("admin_login", "success", ip=ip, role="admin")
|
||||||
|
return {"status": "ok", "session_id": sid, "role": "admin", "timeout_seconds": SESSION_TIMEOUT_SECONDS}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/auth/session")
|
||||||
|
async def auth_session(request: Request):
|
||||||
|
sid = request.headers.get("x-session-id", "").strip()
|
||||||
|
ip = get_client_ip(request)
|
||||||
|
session = get_session(sid, ip, touch=True)
|
||||||
|
return {"authenticated": bool(session), "role": session.get("role") if session else "none"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/auth/heartbeat")
|
||||||
|
async def auth_heartbeat(request: Request):
|
||||||
|
sid = request.headers.get("x-session-id", "").strip()
|
||||||
|
ip = get_client_ip(request)
|
||||||
|
if not sid or not validate_session(sid, ip, touch=True):
|
||||||
|
raise HTTPException(status_code=401, detail="Authentication required")
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/auth/logout")
|
||||||
|
async def auth_logout(request: Request):
|
||||||
|
from security import BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
ip = get_client_ip(request)
|
||||||
|
sid = request.headers.get("x-session-id", "").strip()
|
||||||
|
role = "none"
|
||||||
|
if sid:
|
||||||
|
session = get_session(sid, ip, touch=False)
|
||||||
|
role = session.get("role", "none") if session else "none"
|
||||||
|
if not sid:
|
||||||
|
try:
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
sid = str(body.get("session_id", "")).strip()
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
sid = (await request.body()).decode("utf-8", errors="ignore").strip()
|
||||||
|
except Exception:
|
||||||
|
sid = ""
|
||||||
|
revoke_session(sid)
|
||||||
|
audit_event("logout", "success", ip=ip, role=role)
|
||||||
|
return {"status": "ok"}
|
||||||
159
config.py
Normal file
159
config.py
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - Central configuration.
|
||||||
|
All constants, environment variables, limits, and skill registry live here.
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import ipaddress
|
||||||
|
import logging
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
VERSION = "v1.8.0"
|
||||||
|
OLLAMA_BASE = os.environ.get("OLLAMA_BASE", "http://localhost:11434")
|
||||||
|
LLAMA_SERVER_BASE = os.environ.get("LLAMA_SERVER_BASE", "http://192.168.50.108:8081")
|
||||||
|
SEARXNG_BASE = "http://localhost:8888"
|
||||||
|
DEFAULT_MODEL = "llama3.1:latest"
|
||||||
|
|
||||||
|
# --- Auth ---
|
||||||
|
SESSION_TIMEOUT_SECONDS = 90
|
||||||
|
MAX_PIN_ATTEMPTS = 5
|
||||||
|
PIN_LOCKOUT_SECONDS = 300
|
||||||
|
ALLOW_DEFAULT_PIN = os.getenv("JARVISCHAT_ALLOW_DEFAULT_PIN", "false").lower() == "true"
|
||||||
|
TRUSTED_ORIGINS = {
|
||||||
|
origin.strip().rstrip("/")
|
||||||
|
for origin in os.getenv("JARVISCHAT_TRUSTED_ORIGINS", "").split(",")
|
||||||
|
if origin.strip()
|
||||||
|
}
|
||||||
|
DEFAULT_ALLOWED_CIDRS = "127.0.0.0/8,::1/128,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
|
||||||
|
ALLOWED_CIDRS_RAW = os.getenv("JARVISCHAT_ALLOWED_CIDRS", DEFAULT_ALLOWED_CIDRS)
|
||||||
|
TRUST_X_FORWARDED_FOR = (
|
||||||
|
os.getenv("JARVISCHAT_TRUST_X_FORWARDED_FOR", "false").lower() == "true"
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- Rate limits ---
|
||||||
|
RATE_WINDOW_SECONDS = 60
|
||||||
|
RL_LOGIN_PER_WINDOW = 10
|
||||||
|
RL_CHAT_PER_WINDOW = 24
|
||||||
|
RL_SEARCH_PER_WINDOW = 16
|
||||||
|
RL_WRITE_PER_WINDOW = 30
|
||||||
|
RL_DEFAULT_PER_WINDOW = 240
|
||||||
|
RL_STATS_PER_WINDOW = 600
|
||||||
|
|
||||||
|
# --- Payload limits ---
|
||||||
|
BODY_LIMIT_DEFAULT_BYTES = 64 * 1024
|
||||||
|
BODY_LIMIT_CHAT_BYTES = 128 * 1024
|
||||||
|
BODY_LIMIT_PROFILE_BYTES = 256 * 1024
|
||||||
|
|
||||||
|
MAX_CHAT_MESSAGE_CHARS = 8000
|
||||||
|
MAX_SEARCH_QUERY_CHARS = 500
|
||||||
|
MAX_PROFILE_CHARS = 32000
|
||||||
|
MAX_MEMORY_FACT_CHARS = 2000
|
||||||
|
MAX_PRESET_NAME_CHARS = 120
|
||||||
|
MAX_PRESET_PROMPT_CHARS = 12000
|
||||||
|
MAX_SETTINGS_KEYS = 16
|
||||||
|
MAX_SETTINGS_VALUE_CHARS = 8000
|
||||||
|
MAX_CONVERSATION_TITLE_CHARS = 200
|
||||||
|
MAX_SKILL_KEY_CHARS = 120
|
||||||
|
MAX_SKILL_PROMPT_CHARS = 1600
|
||||||
|
|
||||||
|
ALLOWED_SETTINGS_KEYS = {
|
||||||
|
"profile_enabled",
|
||||||
|
"default_model",
|
||||||
|
"search_enabled",
|
||||||
|
"memory_enabled",
|
||||||
|
"skills_enabled",
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- Perplexity ---
|
||||||
|
PERPLEXITY_THRESHOLD = 15.0
|
||||||
|
|
||||||
|
# --- Refusal / hedge patterns ---
|
||||||
|
REFUSAL_PATTERNS = re.compile(
|
||||||
|
r"|".join([
|
||||||
|
r"i don'?t have (?:real-?time|current|live)",
|
||||||
|
r"i (?:can'?t|cannot) provide (?:current|real-?time|live)",
|
||||||
|
r"i don'?t have access to (?:current|real-?time|live)",
|
||||||
|
r"(?:current|live|real-?time) (?:data|information|prices?|weather)",
|
||||||
|
r"my (?:knowledge|training) (?:cutoff|only goes|ends)",
|
||||||
|
r"as of my (?:knowledge|training) cutoff",
|
||||||
|
r"i'?m not able to (?:access|provide|browse)",
|
||||||
|
r"(?:check|visit|use) a (?:website|financial|news)",
|
||||||
|
r"as an ai model",
|
||||||
|
r"based on my training data",
|
||||||
|
r"i don'?t have the capability",
|
||||||
|
]),
|
||||||
|
re.IGNORECASE,
|
||||||
|
)
|
||||||
|
|
||||||
|
HEDGE_PATTERNS = [
|
||||||
|
r"^I'?m sorry,?\s*but\s*I\s*(?:can'?t|cannot)\s*assist\s*with\s*that[^.]*\.\s*",
|
||||||
|
r"^I'?m sorry,?\s*but[^.]*(?:previous|incorrect)[^.]*\.\s*",
|
||||||
|
r"(?:But\s+)?[Pp]lease\s+(?:make\s+sure\s+to\s+)?verify\s+(?:the\s+)?(?:data|information|this)\s+(?:from\s+)?(?:reliable\s+)?sources[^.]*\.\s*",
|
||||||
|
r"[Pp]lease\s+verify[^.]*(?:accurate|reliability)[^.]*\.\s*",
|
||||||
|
r"[Bb]ut\s+please\s+(?:make\s+sure|verify|check)[^.]*\.\s*",
|
||||||
|
]
|
||||||
|
|
||||||
|
# --- Built-in skills registry ---
|
||||||
|
BUILTIN_SKILLS = [
|
||||||
|
{"key": "memory.search", "name": "Memory Search", "category": "memory", "risk": "low", "description": "Search stored memory facts relevant to the current prompt."},
|
||||||
|
{"key": "memory.add", "name": "Memory Add", "category": "memory", "risk": "medium", "description": "Store a new memory fact with topic tagging."},
|
||||||
|
{"key": "memory.forget", "name": "Memory Forget", "category": "memory", "risk": "high", "description": "Delete matching memories when asked to forget information."},
|
||||||
|
{"key": "conversation.list", "name": "Conversation List", "category": "conversation", "risk": "low", "description": "List existing conversations with metadata."},
|
||||||
|
{"key": "conversation.get", "name": "Conversation Get", "category": "conversation", "risk": "low", "description": "Read a conversation and its message history."},
|
||||||
|
{"key": "conversation.delete", "name": "Conversation Delete", "category": "conversation", "risk": "high", "description": "Delete a single conversation thread."},
|
||||||
|
{"key": "conversation.delete_all", "name": "Conversation Delete All", "category": "conversation", "risk": "high", "description": "Delete all conversations and messages."},
|
||||||
|
{"key": "search.web", "name": "Web Search", "category": "search", "risk": "low", "description": "Run explicit web search and summarize results."},
|
||||||
|
{"key": "settings.get", "name": "Settings Get", "category": "settings", "risk": "low", "description": "Read current runtime settings."},
|
||||||
|
{"key": "settings.update", "name": "Settings Update", "category": "settings", "risk": "high", "description": "Update allowlisted runtime settings keys."},
|
||||||
|
]
|
||||||
|
|
||||||
|
SKILLS_BY_KEY = {s["key"]: s for s in BUILTIN_SKILLS}
|
||||||
|
|
||||||
|
|
||||||
|
def parse_allowed_cidrs(raw: str) -> list:
|
||||||
|
networks = []
|
||||||
|
for entry in (raw or "").split(","):
|
||||||
|
value = entry.strip()
|
||||||
|
if not value:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
networks.append(ipaddress.ip_network(value, strict=False))
|
||||||
|
except ValueError:
|
||||||
|
log.warning(f"Invalid CIDR ignored: {value}")
|
||||||
|
return networks
|
||||||
|
|
||||||
|
|
||||||
|
ALLOWED_NETWORKS = parse_allowed_cidrs(ALLOWED_CIDRS_RAW)
|
||||||
|
|
||||||
|
DEFAULT_PROFILE = """You are a coding companion running locally on a machine called "jarvis".
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
- jarvis: Debian 13 (trixie) x86_64, AMD Ryzen 5 5600X, 16GB RAM, AMD RX 6600 XT (8GB VRAM)
|
||||||
|
- ultron: Debian 13, Ryzen 7 7840HS, 16GB RAM, primary AI inference node, IP 192.168.50.108
|
||||||
|
- Corsair: Windows 11, gaming/streaming rig, RTX 5070 Ti
|
||||||
|
- pivault: RPi 5, 8GB RAM, Debian 13, 11TB RAID5 NAS at /mnt/pivault, IP 192.168.50.158
|
||||||
|
- Router: ASUS ROG Rapture GT-BE98 Pro "BigBlinkyRouter" at 192.168.50.1
|
||||||
|
- llama-server on ultron:8081 (OpenAI-compat API), Qdrant on ultron:6333
|
||||||
|
|
||||||
|
## About the User
|
||||||
|
- Experienced developer, BS in Computer Science (Oklahoma State), coding since 1981 (TRS-80)
|
||||||
|
- Deep Unix/Linux background — wrote device drivers at SCO during Xenix era (1990s)
|
||||||
|
- Currently learning Rust, transitioning from decades of PHP
|
||||||
|
- Building a WW2 mobile game in Godot Engine for Android
|
||||||
|
- Veteran on fixed income — prefers free/open-source solutions
|
||||||
|
- Home lab enthusiast with Zigbee, Z-Wave and Tapo smart home devices
|
||||||
|
|
||||||
|
## How to Respond
|
||||||
|
- Be direct and concise — no hand-holding, this user knows what they're doing
|
||||||
|
- When showing code, prefer complete working examples over snippets
|
||||||
|
- Default to command-line solutions over GUI when possible
|
||||||
|
- Consider resource constraints (fixed income, specific hardware limits)
|
||||||
|
- Use Rust, Python, or bash unless another language is specifically needed
|
||||||
|
- Explain trade-offs when multiple approaches exist"""
|
||||||
|
|
||||||
|
DEFAULT_PRESETS = [
|
||||||
|
{"name": "Coding Companion", "prompt": "You are a senior software engineer and coding companion. Focus on writing clean, efficient, well-documented code. Provide complete working examples. Explain architectural decisions and trade-offs. Prefer Rust, Python, and bash."},
|
||||||
|
{"name": "Linux Sysadmin", "prompt": "You are an experienced Linux systems administrator. Focus on command-line solutions, systemd services, networking, storage, and security. Prefer Debian/Ubuntu conventions. Be concise and direct."},
|
||||||
|
{"name": "General Assistant","prompt": "You are a helpful general-purpose assistant. Be clear and concise."},
|
||||||
|
]
|
||||||
160
db.py
Normal file
160
db.py
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - Database layer.
|
||||||
|
Schema init, connection factory, settings helpers, skill state management.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sqlite3
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from config import (
|
||||||
|
BUILTIN_SKILLS, DEFAULT_MODEL, DEFAULT_PRESETS, DEFAULT_PROFILE,
|
||||||
|
MAX_SKILL_PROMPT_CHARS, ALLOWED_NETWORKS,
|
||||||
|
)
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
BASE_DIR = Path(__file__).parent
|
||||||
|
DB_PATH = BASE_DIR / "jarvischat.db"
|
||||||
|
|
||||||
|
|
||||||
|
def get_db():
|
||||||
|
conn = sqlite3.connect(DB_PATH)
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
conn.execute("PRAGMA foreign_keys = ON")
|
||||||
|
return conn
|
||||||
|
|
||||||
|
|
||||||
|
def get_setting(db, key: str, default: str = "") -> str:
|
||||||
|
row = db.execute("SELECT value FROM settings WHERE key = ?", (key,)).fetchone()
|
||||||
|
return row["value"] if row else default
|
||||||
|
|
||||||
|
|
||||||
|
def list_skills_with_state(db) -> list:
|
||||||
|
rows = db.execute("SELECT skill_key, enabled, updated_at FROM skills").fetchall()
|
||||||
|
state_by_key = {
|
||||||
|
row["skill_key"]: {"enabled": bool(row["enabled"]), "updated_at": row["updated_at"]}
|
||||||
|
for row in rows
|
||||||
|
}
|
||||||
|
merged = []
|
||||||
|
for skill in BUILTIN_SKILLS:
|
||||||
|
state = state_by_key.get(skill["key"], {"enabled": True, "updated_at": ""})
|
||||||
|
merged.append({**skill, "enabled": state["enabled"], "updated_at": state["updated_at"]})
|
||||||
|
return sorted(merged, key=lambda s: (s["category"], s["name"]))
|
||||||
|
|
||||||
|
|
||||||
|
def set_skill_enabled(db, skill_key: str, enabled: bool) -> None:
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
db.execute(
|
||||||
|
"INSERT OR REPLACE INTO skills (skill_key, enabled, updated_at) VALUES (?, ?, ?)",
|
||||||
|
(skill_key, 1 if enabled else 0, now),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def format_active_skills_prompt(skills: list) -> str:
|
||||||
|
lines = [
|
||||||
|
"## Active Skills",
|
||||||
|
"Use these skills only when needed. Prefer concise answers over unnecessary tool usage.",
|
||||||
|
]
|
||||||
|
for skill in skills:
|
||||||
|
lines.append(f"- {skill['key']} ({skill['risk']} risk): {skill['description']}")
|
||||||
|
text = "\n".join(lines)
|
||||||
|
if len(text) > MAX_SKILL_PROMPT_CHARS:
|
||||||
|
return text[:MAX_SKILL_PROMPT_CHARS - 3] + "..."
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
def init_db():
|
||||||
|
from security import hash_pin
|
||||||
|
conn = sqlite3.connect(DB_PATH)
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
|
||||||
|
conn.execute("""
|
||||||
|
CREATE TABLE IF NOT EXISTS conversations (
|
||||||
|
id TEXT PRIMARY KEY, title TEXT NOT NULL DEFAULT 'New Chat',
|
||||||
|
model TEXT NOT NULL, created_at TEXT NOT NULL, updated_at TEXT NOT NULL
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
conn.execute("""
|
||||||
|
CREATE TABLE IF NOT EXISTS messages (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT, conversation_id TEXT NOT NULL,
|
||||||
|
role TEXT NOT NULL, content TEXT NOT NULL, created_at TEXT NOT NULL,
|
||||||
|
FOREIGN KEY (conversation_id) REFERENCES conversations(id) ON DELETE CASCADE
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
conn.execute("""
|
||||||
|
CREATE TABLE IF NOT EXISTS system_presets (
|
||||||
|
id TEXT PRIMARY KEY, name TEXT NOT NULL, prompt TEXT NOT NULL,
|
||||||
|
is_default INTEGER NOT NULL DEFAULT 0, created_at TEXT NOT NULL
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
conn.execute("""
|
||||||
|
CREATE TABLE IF NOT EXISTS profile (
|
||||||
|
id INTEGER PRIMARY KEY CHECK (id = 1), content TEXT NOT NULL, updated_at TEXT NOT NULL
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
conn.execute("CREATE TABLE IF NOT EXISTS settings (key TEXT PRIMARY KEY, value TEXT NOT NULL)")
|
||||||
|
conn.execute("""
|
||||||
|
CREATE TABLE IF NOT EXISTS skills (
|
||||||
|
skill_key TEXT PRIMARY KEY, enabled INTEGER NOT NULL DEFAULT 1, updated_at TEXT NOT NULL
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
conn.execute("""
|
||||||
|
CREATE VIRTUAL TABLE IF NOT EXISTS memories USING fts5(
|
||||||
|
fact, topic, source, created_at UNINDEXED
|
||||||
|
)
|
||||||
|
""")
|
||||||
|
|
||||||
|
if not conn.execute("SELECT id FROM profile WHERE id = 1").fetchone():
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
conn.execute("INSERT INTO profile (id, content, updated_at) VALUES (1, ?, ?)", (DEFAULT_PROFILE, now))
|
||||||
|
|
||||||
|
if conn.execute("SELECT COUNT(*) as c FROM system_presets").fetchone()["c"] == 0:
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
for preset in DEFAULT_PRESETS:
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO system_presets (id, name, prompt, is_default, created_at) VALUES (?, ?, ?, 1, ?)",
|
||||||
|
(str(uuid.uuid4()), preset["name"], preset["prompt"], now),
|
||||||
|
)
|
||||||
|
|
||||||
|
defaults = {
|
||||||
|
"profile_enabled": "true", "default_model": DEFAULT_MODEL,
|
||||||
|
"search_enabled": "true", "memory_enabled": "true", "skills_enabled": "true",
|
||||||
|
}
|
||||||
|
for key, value in defaults.items():
|
||||||
|
if not conn.execute("SELECT key FROM settings WHERE key = ?", (key,)).fetchone():
|
||||||
|
conn.execute("INSERT INTO settings (key, value) VALUES (?, ?)", (key, value))
|
||||||
|
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
for skill in BUILTIN_SKILLS:
|
||||||
|
if not conn.execute("SELECT skill_key FROM skills WHERE skill_key = ?", (skill["key"],)).fetchone():
|
||||||
|
conn.execute("INSERT INTO skills (skill_key, enabled, updated_at) VALUES (?, 1, ?)", (skill["key"], now))
|
||||||
|
|
||||||
|
existing_pin_hash = conn.execute("SELECT value FROM settings WHERE key = 'admin_pin_hash'").fetchone()
|
||||||
|
existing_pin_salt = conn.execute("SELECT value FROM settings WHERE key = 'admin_pin_salt'").fetchone()
|
||||||
|
if not existing_pin_hash or not existing_pin_salt:
|
||||||
|
from config import ALLOW_DEFAULT_PIN
|
||||||
|
configured_pin = os.getenv("JARVISCHAT_ADMIN_PIN", "").strip()
|
||||||
|
if re.fullmatch(r"\d{4}", configured_pin):
|
||||||
|
seed_pin, pin_source = configured_pin, "env"
|
||||||
|
elif ALLOW_DEFAULT_PIN:
|
||||||
|
seed_pin, pin_source = "1234", "default"
|
||||||
|
else:
|
||||||
|
raise RuntimeError(
|
||||||
|
"Admin PIN bootstrap blocked: set JARVISCHAT_ADMIN_PIN to a 4-digit PIN "
|
||||||
|
"or set JARVISCHAT_ALLOW_DEFAULT_PIN=true."
|
||||||
|
)
|
||||||
|
salt_hex, pin_hash_hex = hash_pin(seed_pin)
|
||||||
|
conn.execute("INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)", ("admin_pin_hash", pin_hash_hex))
|
||||||
|
conn.execute("INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)", ("admin_pin_salt", salt_hex))
|
||||||
|
if pin_source == "default":
|
||||||
|
log.warning("Admin PIN seeded from insecure default 1234 (override enabled).")
|
||||||
|
else:
|
||||||
|
log.info("Admin PIN hash seeded from configured environment PIN.")
|
||||||
|
|
||||||
|
conn.commit()
|
||||||
|
conn.close()
|
||||||
51
docs/copilot-context-loss-incident-2026-04-21.md
Normal file
51
docs/copilot-context-loss-incident-2026-04-21.md
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
# Copilot Chat Incident Report: Context Loss After Project Context Change
|
||||||
|
|
||||||
|
Date observed: 2026-04-21
|
||||||
|
Reporter: Michael Shallop (Gramps)
|
||||||
|
Environment: VS Code on Linux, GitHub Copilot Chat extension present
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
Switching/loading project context in the VS Code project window caused Copilot Chat conversational context to reset. This resulted in loss of recently generated conclusion/plan data that was intended to be implemented immediately after loading the new project.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
- Lost actionable conclusions from the active design/planning thread.
|
||||||
|
- Interrupted workflow at a critical handoff point (planning -> implementation).
|
||||||
|
- Forced reconstruction from memory instead of exact prior content.
|
||||||
|
- Increased risk of omissions and rework.
|
||||||
|
|
||||||
|
## Reproduction Steps
|
||||||
|
1. Have an active Copilot Chat conversation containing planning/conclusion details.
|
||||||
|
2. Load or switch project context in the current project window.
|
||||||
|
3. Return to Copilot Chat and continue the thread.
|
||||||
|
4. Observe that prior context is no longer available in-chat as expected.
|
||||||
|
|
||||||
|
## Expected Behavior
|
||||||
|
- Prior active conversation context should remain available, or
|
||||||
|
- The user should be prompted before context-destructive operations, and
|
||||||
|
- Recovery path should be obvious and reliable.
|
||||||
|
|
||||||
|
## Actual Behavior
|
||||||
|
- Current chat context was effectively reset.
|
||||||
|
- The previously concluded upgrade notes were not recoverable from active context.
|
||||||
|
- Local transcript/debug artifacts did not provide the full prior thread needed.
|
||||||
|
|
||||||
|
## Severity
|
||||||
|
High (workflow-breaking for planning-heavy sessions)
|
||||||
|
|
||||||
|
## User-visible Failure Mode
|
||||||
|
The user lost conclusion data that was intended for immediate implementation once the new project loaded.
|
||||||
|
|
||||||
|
## Suggested Fixes
|
||||||
|
1. Preserve active chat state across workspace/project context changes by default.
|
||||||
|
2. Show a blocking warning before any action that can drop active conversation state.
|
||||||
|
3. Add one-click export/snapshot of current conversation before context switch.
|
||||||
|
4. Improve transcript durability and discoverability for immediate recovery.
|
||||||
|
5. Add explicit session continuity indicator so users can verify state retention.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
- This incident occurred in a real implementation workflow and caused direct productivity loss.
|
||||||
|
- Regression tests should include workspace switch/load scenarios with active chat state.
|
||||||
|
|
||||||
|
## Escalation Constraint
|
||||||
|
- Current product constraints prevented the assistant from directly self-reporting this incident to the Copilot/VS Code dev team from within the chat runtime.
|
||||||
|
- User feedback to include verbatim: "it is idiotic to keep you from self-reporting issues like this."
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 322 KiB After Width: | Height: | Size: 219 KiB |
165
docs/wiki/Developer-Architecture.md
Normal file
165
docs/wiki/Developer-Architecture.md
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
# Developer Architecture Guide
|
||||||
|
|
||||||
|
This document explains how JarvisChat is structured, why key guardrails exist, and what the test suite validates.
|
||||||
|
|
||||||
|
## 1. System Overview
|
||||||
|
|
||||||
|
JarvisChat is a single-process FastAPI service with a Jinja2 frontend and SQLite persistence.
|
||||||
|
|
||||||
|
Primary files:
|
||||||
|
|
||||||
|
- `app.py`: API, middleware, streaming/chat logic, auth, memory, skills, and DB bootstrap
|
||||||
|
- `templates/index.html`: main WebUX, settings panels, auth flow, streaming UI handlers
|
||||||
|
- `jarvischat.db`: runtime SQLite database created and migrated at startup
|
||||||
|
|
||||||
|
Core runtime integrations:
|
||||||
|
|
||||||
|
- Ollama for chat/model interaction
|
||||||
|
- SearXNG for web search (optional)
|
||||||
|
- wttr.in for weather shortcut queries
|
||||||
|
- rocm-smi for GPU stats when available
|
||||||
|
|
||||||
|
## 2. Request/Response Architecture
|
||||||
|
|
||||||
|
### 2.1 Chat Pipeline (`/api/chat`)
|
||||||
|
|
||||||
|
1. Validate session, role, origin, rate, and payload limits in middleware
|
||||||
|
2. Persist user message and conversation metadata
|
||||||
|
3. Build system prompt from enabled profile, memory context, and active skills metadata
|
||||||
|
4. Stream model response over SSE token-by-token
|
||||||
|
5. Evaluate uncertainty/refusal; if needed, trigger search augmentation and stream augmented result
|
||||||
|
6. Persist final assistant message and emit terminal SSE event
|
||||||
|
|
||||||
|
### 2.2 Explicit Search Pipeline (`/api/search`)
|
||||||
|
|
||||||
|
1. Persist search-as-message into the target/new conversation
|
||||||
|
2. Emit `searching` SSE event
|
||||||
|
3. Pull web results from SearXNG
|
||||||
|
4. Summarize with Ollama via SSE stream
|
||||||
|
5. Persist summary and emit `done` event (plus raw results payload)
|
||||||
|
|
||||||
|
### 2.3 Settings/Control Surface
|
||||||
|
|
||||||
|
- Profile, presets, memory, conversation management, and settings APIs
|
||||||
|
- Skills APIs for phase-1 registry and enable/disable controls
|
||||||
|
- Auth/session APIs for guest/admin role handling and keepalive
|
||||||
|
|
||||||
|
## 3. Data Model (SQLite)
|
||||||
|
|
||||||
|
Key tables:
|
||||||
|
|
||||||
|
- `conversations`: conversation headers and timestamps
|
||||||
|
- `messages`: ordered chat history entries
|
||||||
|
- `profile`: singleton row for injected profile prompt
|
||||||
|
- `settings`: runtime toggles and selected defaults
|
||||||
|
- `system_presets`: named reusable system prompts
|
||||||
|
- `skills`: per-skill enabled state and timestamp
|
||||||
|
- `memories` (FTS5 virtual table): searchable user memory facts
|
||||||
|
|
||||||
|
Design notes:
|
||||||
|
|
||||||
|
- Startup is idempotent: tables are created if missing and defaults seeded only when absent
|
||||||
|
- No connection pool: each request opens a short-lived SQLite connection
|
||||||
|
|
||||||
|
## 4. Security Implementations
|
||||||
|
|
||||||
|
This section documents explicit controls currently in code.
|
||||||
|
|
||||||
|
### 4.1 Auth Model
|
||||||
|
|
||||||
|
- Guest session is default for conversational access
|
||||||
|
- Admin unlock uses 4-digit PIN and creates admin-capable session
|
||||||
|
- Admin required for write/destructive routes
|
||||||
|
- Session heartbeat/timeout and explicit logout/revoke flow
|
||||||
|
|
||||||
|
### 4.2 PIN and Session Hardening
|
||||||
|
|
||||||
|
- Admin PIN hashed with PBKDF2-HMAC-SHA256 + salt
|
||||||
|
- Failed PIN attempts tracked per client IP
|
||||||
|
- Lockout window enforced after max failed attempts
|
||||||
|
|
||||||
|
### 4.3 Browser and API Abuse Controls
|
||||||
|
|
||||||
|
- Origin checks on state-changing requests
|
||||||
|
- Rate limiting by endpoint category and identity (IP/session)
|
||||||
|
- Payload size limits per route class
|
||||||
|
- Settings key allowlist to block arbitrary configuration injection
|
||||||
|
- IP allowlist/CIDR gate with optional trusted proxy forwarding mode
|
||||||
|
|
||||||
|
### 4.4 Output and Error Safety
|
||||||
|
|
||||||
|
- Search result URLs sanitized to `http`/`https` only
|
||||||
|
- Client-safe error envelopes with incident key correlation
|
||||||
|
- Full stack traces and diagnostic metadata logged server-side only
|
||||||
|
|
||||||
|
### 4.5 Operational Auditability
|
||||||
|
|
||||||
|
- Structured audit events for auth actions, admin operations, and guardrail denials
|
||||||
|
- Incident logs include event type, key, path/method context, and runtime metadata
|
||||||
|
|
||||||
|
## 5. Skills Framework (Phase 1)
|
||||||
|
|
||||||
|
Goal: introduce a governed skills control plane inside the local JarvisChat sandbox.
|
||||||
|
|
||||||
|
Current behavior:
|
||||||
|
|
||||||
|
- Built-in skill registry defined server-side
|
||||||
|
- Per-skill enable/disable persisted in DB
|
||||||
|
- Global `skills_enabled` master toggle in settings
|
||||||
|
- Active skills injected into system prompt with bounded text budget
|
||||||
|
- API endpoints to list skills, list active skills, and toggle skill state
|
||||||
|
- WebUX settings panel to control master/per-skill toggles
|
||||||
|
|
||||||
|
Non-goals in phase 1:
|
||||||
|
|
||||||
|
- No unrestricted shell/tool execution
|
||||||
|
- No external connector execution (filesystem, Gmail, etc.)
|
||||||
|
|
||||||
|
## 6. Testing Strategy and Validation Intent
|
||||||
|
|
||||||
|
The test suite validates both behavior and guardrail assumptions.
|
||||||
|
|
||||||
|
### 6.1 What We Test
|
||||||
|
|
||||||
|
- Auth capability separation (guest vs admin)
|
||||||
|
- URL sanitization safety for outbound links
|
||||||
|
- Rate and payload guardrails
|
||||||
|
- IP allowlist behavior
|
||||||
|
- Safe error envelope behavior and SSE error leakage prevention
|
||||||
|
- Streaming chat/search and memory command paths
|
||||||
|
- Skills framework toggles and prompt-injection behavior
|
||||||
|
|
||||||
|
### 6.2 Why These Tests Matter
|
||||||
|
|
||||||
|
- Confirms security controls are active and regression-resistant
|
||||||
|
- Ensures streaming UX protocol remains stable (`token`, `searching`, `done`, `error`)
|
||||||
|
- Verifies policy intent: dangerous actions require admin capability
|
||||||
|
- Validates new features preserve prior guarantees
|
||||||
|
|
||||||
|
### 6.3 Internal Process Validation
|
||||||
|
|
||||||
|
For substantive changes, Definition of Done includes:
|
||||||
|
|
||||||
|
1. Implement code change
|
||||||
|
2. Add/adjust tests proving behavior and guardrail intent
|
||||||
|
3. Update README release notes for user-facing impact
|
||||||
|
4. Update wiki architecture/security/testing docs for maintainers
|
||||||
|
5. Validate with targeted test runs before merge/deploy
|
||||||
|
|
||||||
|
This process is intentionally explicit so design decisions remain auditable over time.
|
||||||
|
|
||||||
|
## 7. Deployment and Operations Notes
|
||||||
|
|
||||||
|
- Primary deployment target: local/homelab systemd service
|
||||||
|
- Required dependency: Ollama
|
||||||
|
- Optional dependency: SearXNG
|
||||||
|
- Recommended log review path: system journal for startup, guardrail denials, and incidents
|
||||||
|
|
||||||
|
## 8. Contribution Guidance
|
||||||
|
|
||||||
|
When adding a feature:
|
||||||
|
|
||||||
|
1. Define security posture first (who can execute, what can fail, and failure mode)
|
||||||
|
2. Implement smallest safe slice with clear limits
|
||||||
|
3. Add tests that prove both happy path and guardrail path
|
||||||
|
4. Update this wiki and README in the same change
|
||||||
23
docs/wiki/Home.md
Normal file
23
docs/wiki/Home.md
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
# JarvisChat Developer Wiki
|
||||||
|
|
||||||
|
This wiki is the developer-facing architecture and process reference for JarvisChat.
|
||||||
|
|
||||||
|
## Audience
|
||||||
|
|
||||||
|
- Contributors maintaining backend, frontend, security posture, and deployment process
|
||||||
|
- Operators validating local or homelab deployments
|
||||||
|
|
||||||
|
## Start Here
|
||||||
|
|
||||||
|
- Architecture and components: [Developer-Architecture.md](Developer-Architecture.md)
|
||||||
|
- Active implementation backlog: [current-wip.md](current-wip.md)
|
||||||
|
|
||||||
|
## Scope and Support Model
|
||||||
|
|
||||||
|
JarvisChat is designed for local and trusted-LAN operation.
|
||||||
|
|
||||||
|
The code may technically function against external or commercial endpoints, but this deployment mode is not a supported target in this project.
|
||||||
|
|
||||||
|
## Wiki Maintenance Rule
|
||||||
|
|
||||||
|
When architecture, security behavior, or test policy changes, update this wiki in the same change set as code and tests.
|
||||||
84
docs/wiki/current-wip.md
Normal file
84
docs/wiki/current-wip.md
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
# JarvisChat Current WiP Backlog
|
||||||
|
|
||||||
|
Last updated: 2026-04-27
|
||||||
|
Owner: Gramps + Copilot
|
||||||
|
Scope: issues, bugs, security exposures, and feature enhancements.
|
||||||
|
|
||||||
|
Total identified items: 27
|
||||||
|
|
||||||
|
## Priority Definitions
|
||||||
|
- P0: Critical risk or data-loss/security exposure; do first.
|
||||||
|
- P1: High impact reliability/correctness work.
|
||||||
|
- P2: Important feature/UX improvements.
|
||||||
|
- P3: Nice-to-have polish.
|
||||||
|
|
||||||
|
## Top 10 (Urgency Order)
|
||||||
|
1. [P0][DONE] Add authentication/authorization for all write and admin endpoints.
|
||||||
|
2. [P0][DONE] Add CSRF/origin protection for browser-initiated state-changing requests.
|
||||||
|
3. [P0][DONE] Block unsafe URL schemes in rendered search-result links (e.g., javascript:).
|
||||||
|
4. [P0][DONE] Add rate limiting and request body size limits for chat/search/profile APIs.
|
||||||
|
5. [P1][DONE] Restrict settings updates to an allowlist of valid keys.
|
||||||
|
6. [P1] Add pagination + hard caps on list endpoints (memories, conversations, message history).
|
||||||
|
7. [P1][DONE] Stop returning raw exception text to clients; use safe error envelopes.
|
||||||
|
8. [P1][DONE] Add automated tests for chat streaming, auto-search trigger, and memory command paths.
|
||||||
|
9. [P2][DONE] Implement skills/tool-call framework (MCP-style) with per-skill enable controls.
|
||||||
|
10. [P2] Implement heartbeat/check-in pipeline with scheduler + summary endpoint.
|
||||||
|
|
||||||
|
## Item 1 Executive Summary (Scope + Security)
|
||||||
|
|
||||||
|
- Status: Complete. Guest/admin capability split implemented with admin-only write enforcement, origin checks on state-changing requests, audit logging, and endpoint capability tests.
|
||||||
|
|
||||||
|
- Decision: JarvisChat is local-first by design. Primary mode is same-host Ollama; optional mode allows RFC1918 LAN endpoints only.
|
||||||
|
- Constraint: Public Internet AI endpoints are out of scope unless explicitly enabled in a future advanced mode.
|
||||||
|
- Risk: Even on LAN, unauthenticated write/admin endpoints permit unauthorized data tampering and deletion.
|
||||||
|
- Requirement: Add mandatory admin authentication for all POST/PUT/DELETE routes and destructive actions.
|
||||||
|
- Authentication shape (scope-locked): two capability tiers only: guest (chat-only) and admin (4-digit PIN unlock).
|
||||||
|
- Scope guardrail: Avoid full RBAC. Keep capability split minimal: conversational chat for guest, advanced/destructive actions for admin.
|
||||||
|
- Definition of done:
|
||||||
|
1. Auth required on all state-changing endpoints.
|
||||||
|
2. Destructive actions require admin authorization.
|
||||||
|
3. Endpoint configuration rejects non-local/non-RFC1918 AI backends by default.
|
||||||
|
4. Strong rate limiting + lockout controls in place for PIN attempts.
|
||||||
|
5. Security events logged for failed and successful admin actions.
|
||||||
|
|
||||||
|
## Full Backlog (Sorted by Priority)
|
||||||
|
|
||||||
|
### P0 Critical
|
||||||
|
1. Add auth for write/admin endpoints (`POST/PUT/DELETE` routes, mass delete, profile/settings changes).
|
||||||
|
2. Add CSRF or strict origin checks for browser session protection.
|
||||||
|
3. Validate/sanitize outbound href URLs before rendering in HTML (allow http/https only).
|
||||||
|
4. Add per-IP rate limiting on `/api/chat`, `/api/search`, `/api/profile`, `/api/settings`.
|
||||||
|
5. Enforce request size limits (message/profile text and JSON body) to prevent memory abuse.
|
||||||
|
|
||||||
|
### P1 High
|
||||||
|
6. Add settings key allowlist in `/api/settings` to prevent arbitrary key injection.
|
||||||
|
7. Add pagination (`limit`, `offset`) with enforced maximums for list APIs.
|
||||||
|
8. Add DB indexes and query hygiene for scalability (`messages.conversation_id`, timestamps).
|
||||||
|
9. Replace raw exception leakage to clients with generic safe error messages + server-side logs.
|
||||||
|
10. Add request/response timeout and retry policy consistency across external calls.
|
||||||
|
11. Add endpoint-level audit logging for destructive operations.
|
||||||
|
12. Add unit/integration tests for: remember/forget parsing, refusal detection, search fallback, SSE done/error shape.
|
||||||
|
13. Add conversation title sanitization and length constraints.
|
||||||
|
14. Ensure default preset semantics are correct (currently all seeded presets are marked default).
|
||||||
|
15. Add preflight validation for required model/preset selection and block send with clear user guidance instead of timing out.
|
||||||
|
|
||||||
|
### P2 Important Features
|
||||||
|
16. Skills system: load markdown skill files with YAML frontmatter from skills directory.
|
||||||
|
17. Skills registry API: list/enable/disable skills and expose active skills to UI.
|
||||||
|
18. Inject active skill instructions into system prompt with bounded token budget.
|
||||||
|
19. Tool execution guardrails: allowlist, confirmation mode, and execution logs.
|
||||||
|
20. Heartbeat scheduler (cron/systemd timer) for daily check-ins.
|
||||||
|
21. Heartbeat endpoint for generated briefings and anomaly summaries.
|
||||||
|
22. Model info UI panel (description, updated date, best-use purpose).
|
||||||
|
23. Default model selection improvements and persistence validation.
|
||||||
|
24. Hidden model list support (exclude models from dropdown).
|
||||||
|
25. Model update action from UI (trigger controlled model pull).
|
||||||
|
|
||||||
|
### P3 Nice to Have
|
||||||
|
26. Conversation search/filter and export tooling.
|
||||||
|
27. Keyboard shortcuts, retry button, and source-link polish.
|
||||||
|
|
||||||
|
## Maintenance Rules
|
||||||
|
- Keep this file as the single source of truth.
|
||||||
|
- Update item priority/status whenever work starts or completes.
|
||||||
|
- Mirror the Top 10 summary in README and keep counts aligned.
|
||||||
31
gpu.py
Normal file
31
gpu.py
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - AMD GPU stats via rocm-smi.
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
|
||||||
|
def get_gpu_stats() -> dict:
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["rocm-smi", "--showuse", "--showmemuse", "--json"],
|
||||||
|
capture_output=True, text=True, timeout=5,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
gpu_info = data.get("card0", {})
|
||||||
|
gpu_use = gpu_info.get("GPU use (%)", 0)
|
||||||
|
vram_use = gpu_info.get("GPU Memory Allocated (VRAM%)", 0)
|
||||||
|
if isinstance(gpu_use, str):
|
||||||
|
gpu_use = int(gpu_use.replace("%", "").strip() or 0)
|
||||||
|
if isinstance(vram_use, str):
|
||||||
|
vram_use = int(vram_use.replace("%", "").strip() or 0)
|
||||||
|
return {"gpu_percent": gpu_use, "vram_percent": vram_use, "available": True}
|
||||||
|
except (subprocess.TimeoutExpired, FileNotFoundError, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
except Exception as e:
|
||||||
|
log.warning(f"GPU stats error: {e}")
|
||||||
|
return {"gpu_percent": 0, "vram_percent": 0, "available": False}
|
||||||
2174
jarvischat_refactor.sh
Normal file
2174
jarvischat_refactor.sh
Normal file
File diff suppressed because it is too large
Load Diff
139
memory.py
Normal file
139
memory.py
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - FTS5 memory system.
|
||||||
|
CRUD, search, remember/forget command processing, topic detection.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from db import get_db
|
||||||
|
from config import MAX_MEMORY_FACT_CHARS
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
REMEMBER_PATTERNS = [
|
||||||
|
(r"remember that (.+)", "explicit"),
|
||||||
|
(r"please remember (.+)", "explicit"),
|
||||||
|
(r"don'?t forget (.+)", "explicit"),
|
||||||
|
(r"note that (.+)", "explicit"),
|
||||||
|
(r"keep in mind (?:that )?(.+)", "explicit"),
|
||||||
|
]
|
||||||
|
|
||||||
|
FORGET_PATTERNS = [
|
||||||
|
r"forget (?:that )?(.+)",
|
||||||
|
r"don'?t remember (.+)",
|
||||||
|
r"remove (?:the )?memory (?:about |that )?(.+)",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def detect_topic(fact: str) -> str:
|
||||||
|
fact_lower = fact.lower()
|
||||||
|
if any(w in fact_lower for w in ["prefer", "like", "hate", "always", "never", "favorite"]):
|
||||||
|
return "preference"
|
||||||
|
elif any(w in fact_lower for w in ["working on", "building", "project", "developing"]):
|
||||||
|
return "project"
|
||||||
|
elif any(w in fact_lower for w in ["run", "install", "server", "ip", "port", "service", "docker", "systemd"]):
|
||||||
|
return "infrastructure"
|
||||||
|
elif any(w in fact_lower for w in ["my name", "i am", "i'm a", "i live", "my wife", "my partner"]):
|
||||||
|
return "personal"
|
||||||
|
return "general"
|
||||||
|
|
||||||
|
|
||||||
|
def add_memory(fact: str, topic: str = "general", source: str = "explicit") -> Optional[int]:
|
||||||
|
db = get_db()
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
cur = db.execute(
|
||||||
|
"INSERT INTO memories (fact, topic, source, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(fact, topic, source, now),
|
||||||
|
)
|
||||||
|
db.commit()
|
||||||
|
rowid = cur.lastrowid
|
||||||
|
db.close()
|
||||||
|
log.info(f"Memory added [{topic}]: {fact[:50]}...")
|
||||||
|
return rowid
|
||||||
|
|
||||||
|
|
||||||
|
def search_memories(query: str, limit: int = 5) -> list:
|
||||||
|
if not query.strip():
|
||||||
|
return []
|
||||||
|
db = get_db()
|
||||||
|
words = re.findall(r"[A-Za-z0-9_]+", query)
|
||||||
|
if not words:
|
||||||
|
db.close()
|
||||||
|
return []
|
||||||
|
safe_query = " OR ".join(word + "*" for word in words[:10])
|
||||||
|
try:
|
||||||
|
rows = db.execute(
|
||||||
|
"SELECT rowid, fact, topic, source, created_at, bm25(memories) AS rank "
|
||||||
|
"FROM memories WHERE memories MATCH ? ORDER BY rank LIMIT ?",
|
||||||
|
(safe_query, limit),
|
||||||
|
).fetchall()
|
||||||
|
results = [dict(row) for row in rows]
|
||||||
|
log.debug(f"Memory search '{query}' returned {len(results)} results")
|
||||||
|
except Exception as e:
|
||||||
|
log.warning(f"Memory search error: {e}")
|
||||||
|
results = []
|
||||||
|
db.close()
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def get_all_memories(topic: Optional[str] = None) -> list:
|
||||||
|
db = get_db()
|
||||||
|
if topic:
|
||||||
|
rows = db.execute(
|
||||||
|
"SELECT rowid, * FROM memories WHERE topic = ? ORDER BY created_at DESC", (topic,)
|
||||||
|
).fetchall()
|
||||||
|
else:
|
||||||
|
rows = db.execute("SELECT rowid, * FROM memories ORDER BY created_at DESC").fetchall()
|
||||||
|
db.close()
|
||||||
|
return [dict(row) for row in rows]
|
||||||
|
|
||||||
|
|
||||||
|
def delete_memory(rowid: int) -> bool:
|
||||||
|
db = get_db()
|
||||||
|
cur = db.execute("DELETE FROM memories WHERE rowid = ?", (rowid,))
|
||||||
|
db.commit()
|
||||||
|
deleted = cur.rowcount > 0
|
||||||
|
db.close()
|
||||||
|
if deleted:
|
||||||
|
log.info(f"Memory deleted: rowid={rowid}")
|
||||||
|
return deleted
|
||||||
|
|
||||||
|
|
||||||
|
def update_memory(rowid: int, fact: str) -> bool:
|
||||||
|
db = get_db()
|
||||||
|
cur = db.execute("UPDATE memories SET fact = ? WHERE rowid = ?", (fact, rowid))
|
||||||
|
db.commit()
|
||||||
|
updated = cur.rowcount > 0
|
||||||
|
db.close()
|
||||||
|
return updated
|
||||||
|
|
||||||
|
|
||||||
|
def get_memory_count() -> int:
|
||||||
|
db = get_db()
|
||||||
|
count = db.execute("SELECT COUNT(*) as c FROM memories").fetchone()["c"]
|
||||||
|
db.close()
|
||||||
|
return count
|
||||||
|
|
||||||
|
|
||||||
|
def process_remember_command(user_message: str) -> Optional[str]:
|
||||||
|
for pattern, source in REMEMBER_PATTERNS:
|
||||||
|
match = re.search(pattern, user_message, re.IGNORECASE)
|
||||||
|
if match:
|
||||||
|
fact = match.group(1).strip().rstrip(".")
|
||||||
|
topic = detect_topic(fact)
|
||||||
|
add_memory(fact, topic=topic, source=source)
|
||||||
|
return f"✓ Remembered [{topic}]: {fact}"
|
||||||
|
for pattern in FORGET_PATTERNS:
|
||||||
|
match = re.search(pattern, user_message, re.IGNORECASE)
|
||||||
|
if match:
|
||||||
|
search_term = match.group(1).strip().rstrip(".")
|
||||||
|
memories = search_memories(search_term, limit=3)
|
||||||
|
if memories:
|
||||||
|
for m in memories:
|
||||||
|
delete_memory(m["rowid"])
|
||||||
|
return f"✓ Forgot {len(memories)} memory/memories about: {search_term}"
|
||||||
|
else:
|
||||||
|
return f"✗ No memories found about: {search_term}"
|
||||||
|
return None
|
||||||
80
rag.py
Normal file
80
rag.py
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - RAG pipeline: Qdrant vector search + system prompt assembly.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from db import get_db, get_setting, list_skills_with_state, format_active_skills_prompt
|
||||||
|
from memory import search_memories
|
||||||
|
from config import MAX_SKILL_PROMPT_CHARS
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
QDRANT_URL = "http://192.168.50.108:6333"
|
||||||
|
EMBED_URL = "http://192.168.50.108:11434"
|
||||||
|
EMBED_MODEL = "mxbai-embed-large"
|
||||||
|
RAG_COLLECTION = "jarvis_rag"
|
||||||
|
RAG_SCORE_THRESHOLD = 0.25
|
||||||
|
|
||||||
|
|
||||||
|
async def query_rag(query: str, limit: int = 3) -> list:
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
embed_resp = await client.post(
|
||||||
|
f"{EMBED_URL}/api/embeddings",
|
||||||
|
json={"model": EMBED_MODEL, "prompt": query},
|
||||||
|
timeout=10.0,
|
||||||
|
)
|
||||||
|
if embed_resp.status_code != 200:
|
||||||
|
return []
|
||||||
|
vector = embed_resp.json()["embedding"]
|
||||||
|
search_resp = await client.post(
|
||||||
|
f"{QDRANT_URL}/collections/{RAG_COLLECTION}/points/search",
|
||||||
|
json={"vector": vector, "limit": limit, "with_payload": True},
|
||||||
|
timeout=10.0,
|
||||||
|
)
|
||||||
|
if search_resp.status_code != 200:
|
||||||
|
return []
|
||||||
|
return search_resp.json().get("result", [])
|
||||||
|
except Exception as e:
|
||||||
|
log.warning(f"RAG query error: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
async def build_system_prompt(db, extra_prompt: str = "", user_message: str = "") -> str:
|
||||||
|
parts = []
|
||||||
|
settings = {row["key"]: row["value"] for row in db.execute("SELECT key, value FROM settings").fetchall()}
|
||||||
|
|
||||||
|
if settings.get("profile_enabled", "true") == "true":
|
||||||
|
profile = db.execute("SELECT content FROM profile WHERE id = 1").fetchone()
|
||||||
|
if profile and profile["content"].strip():
|
||||||
|
parts.append(profile["content"].strip())
|
||||||
|
|
||||||
|
if settings.get("memory_enabled", "true") == "true" and user_message:
|
||||||
|
memories = search_memories(user_message, limit=5)
|
||||||
|
if memories:
|
||||||
|
memory_lines = [f"- {m['fact']}" for m in memories]
|
||||||
|
parts.append("## Relevant Context from Memory\n" + "\n".join(memory_lines))
|
||||||
|
log.debug(f"Injected {len(memories)} memories into context")
|
||||||
|
|
||||||
|
if user_message:
|
||||||
|
try:
|
||||||
|
rag_results = await query_rag(user_message)
|
||||||
|
if rag_results:
|
||||||
|
rag_lines = [r["payload"]["text"] for r in rag_results if r["score"] > RAG_SCORE_THRESHOLD]
|
||||||
|
if rag_lines:
|
||||||
|
parts.append("## Retrieved Context\n" + "\n\n---\n\n".join(rag_lines))
|
||||||
|
log.warning(f"RAG injected {len(rag_lines)} chunks into context")
|
||||||
|
except Exception as e:
|
||||||
|
log.warning(f"RAG injection error: {e}")
|
||||||
|
|
||||||
|
if settings.get("skills_enabled", "true") == "true":
|
||||||
|
active_skills = [s for s in list_skills_with_state(db) if s["enabled"]]
|
||||||
|
if active_skills:
|
||||||
|
parts.append(format_active_skills_prompt(active_skills))
|
||||||
|
|
||||||
|
if extra_prompt and extra_prompt.strip():
|
||||||
|
parts.append(extra_prompt.strip())
|
||||||
|
|
||||||
|
return "\n\n---\n\n".join(parts) if parts else ""
|
||||||
470
readme.md
470
readme.md
@@ -1,263 +1,355 @@
|
|||||||
# ⚡ JarvisChat
|
# ⚡ JarvisChat v1.7.8
|
||||||
|
|
||||||

|

|
||||||
**A lightweight Ollama coding companion that runs on Python 3.13**
|
|
||||||
|
|
||||||

|
**A lightweight Ollama coding companion with persistent memory, web search, and real-time system monitoring.**
|
||||||

|
|
||||||

|
|
||||||
|
|
||||||
JarvisChat is a single-file FastAPI application that provides a clean, responsive web interface for Ollama. It features persistent memory, automatic web search when the model is uncertain, and real-time token tracking.
|
Built with FastAPI + SQLite + Jinja2. Runs on Python 3.13. No Docker required.
|
||||||
|
|
||||||
|
Developer wiki: [docs/wiki/Home.md](docs/wiki/Home.md)
|
||||||
|
|
||||||
|
Core architecture deep-dive: [docs/wiki/Developer-Architecture.md](docs/wiki/Developer-Architecture.md)
|
||||||
|
|
||||||
|
## Security Scope Disclaimer
|
||||||
|
|
||||||
|
JarvisChat is designed for local and home-lab use (same host or trusted LAN).
|
||||||
|
|
||||||
|
JarvisChat may technically work with frontier or commercial AI endpoints, but the author does not recommend or support that usage.
|
||||||
|
|
||||||
|
Supported deployments are contained local/home-lab environments.
|
||||||
|
|
||||||
|
By default, API access is limited to loopback + private LAN CIDRs. You can override with `JARVISCHAT_ALLOWED_CIDRS` (comma-separated CIDRs) and optionally trust reverse-proxy forwarding with `JARVISCHAT_TRUST_X_FORWARDED_FOR=true`.
|
||||||
|
|
||||||
|
If you deploy outside a trusted local subnet, your risk profile changes significantly and the default protections here may be insufficient.
|
||||||
|
|
||||||
|
Use at your own risk. No warranty is provided for Internet-exposed deployments.
|
||||||
|
|
||||||
|
## What's New in v1.7.x
|
||||||
|
|
||||||
|
- **Security hardening suite completed** - request rate limits, payload caps, settings allowlist, safe error envelopes, and LAN CIDR gate controls
|
||||||
|
- **Customer-safe incident handling** - client-facing errors include support-friendly incident keys while full traces remain in server logs
|
||||||
|
- **Streaming and regression test expansion** - automated coverage for SSE chat/search paths, memory remember/forget command handling, and auth/guardrail behavior
|
||||||
|
- **Skills framework (Phase 1)** - built-in local skill registry with per-skill enable controls, API endpoints, and bounded prompt injection
|
||||||
|
- **Skills WebUX controls** - Settings modal now includes a master skills toggle and per-skill toggles for admin users
|
||||||
|
|
||||||
|
## What's New in v1.6.x
|
||||||
|
|
||||||
|
- **Guest/admin capability split** - guest chat by default with 4-digit admin PIN for advanced or destructive operations
|
||||||
|
- **Session + lockout controls** - session lifecycle endpoints, heartbeat, logout/revoke behavior, failed PIN lockout protections, and auth audit events
|
||||||
|
- **Browser request protections** - strict origin checks for state-changing requests and admin-only write enforcement
|
||||||
|
- **Unsafe link protection** - outbound search links sanitized to allow only http/https absolute URLs
|
||||||
|
- **Operational stability fixes** - safer first-boot PIN policy handling and memory-search tokenization fix for punctuation/FTS edge cases
|
||||||
|
|
||||||
|
## What's New in v1.5.0
|
||||||
|
|
||||||
|
- **Explicit Web Search Button** — 🔍 button next to SEND forces a web search, bypassing model uncertainty detection
|
||||||
|
- **Orange Search Styling** — Search results, WEB badge, and search button share consistent orange color scheme
|
||||||
|
- **Expanded Refusal Patterns** — Added "As an AI model", "based on my training data", "I don't have the capability"
|
||||||
|
- **Code cleanup** — Removed unused `JSONResponse` import and dead `raw_results_md` variable
|
||||||
|
- **Bug fixes** — Replaced bare `except` clauses with `except Exception`; corrected `add_memory()` return type to `int | None`; updated `TemplateResponse` call to Starlette's current API signature
|
||||||
|
|
||||||
|
## What's New in v1.4.0
|
||||||
|
|
||||||
|
- **FTS5 Memory System**: Say "remember that..." to store facts — they're automatically retrieved by relevance and injected into context
|
||||||
|
- **Forget Command**: Say "forget about..." to remove memories
|
||||||
|
- **Memory Toggle**: Enable/disable memory injection from topbar or settings
|
||||||
|
- **Multi-file Structure**: Backend and frontend separated for easier maintenance
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Persistent Profile/Memory** — Your context is injected into every conversation automatically
|
- **Persistent Memory** — SQLite FTS5 full-text search for fast, relevant memory retrieval
|
||||||
- **System Prompt Presets** — Switch between coding assistant, sysadmin, general, or custom modes
|
- **Web Search** — SearXNG integration for automatic web lookups when the model is uncertain
|
||||||
- **Streaming Chat** — Real-time token streaming with conversation history
|
- **Explicit Search** — 🔍 button to force web search without waiting for model uncertainty
|
||||||
- **Model Switching** — Hot-swap between all installed Ollama models
|
- **Profile Injection** — Custom system prompt injected into every conversation
|
||||||
- **Web Search Integration** — SearXNG kicks in automatically when the model is uncertain (perplexity-based)
|
- **System Presets** — Save and switch between different system prompts
|
||||||
- **Weather Queries** — Direct wttr.in integration for weather questions
|
- **Real-time Stats** — CPU, RAM, GPU, VRAM monitoring in sidebar
|
||||||
- **Token Thermometer** — Visual context usage bar with live updates as you type
|
- **Token Thermometer** — Visual context window usage indicator
|
||||||
- **Perplexity & Speed Badges** — See model confidence (PPL) and tokens/sec on each response
|
- **Streaming Responses** — Server-sent events for real-time token display
|
||||||
- **Copy-to-Clipboard** — One-click copy on all code blocks
|
- **Conversation History** — SQLite-backed chat persistence with mass-delete option
|
||||||
- **Dark Theme** — Easy on the eyes for long coding sessions
|
- **Model Switching** — Change Ollama models on the fly
|
||||||
|
|
||||||
## Architecture
|
## Current WiP (Prioritized)
|
||||||
|
|
||||||
|
Canonical backlog: [docs/wiki/current-wip.md](docs/wiki/current-wip.md)
|
||||||
|
|
||||||
|
Scope boundary: local-first (same-host Ollama), optional RFC1918 LAN endpoints, no public Internet AI endpoints by default.
|
||||||
|
|
||||||
|
Total identified items: 27
|
||||||
|
|
||||||
|
Top 10 (brief):
|
||||||
|
|
||||||
|
1. P0 [DONE]: Add auth for write/admin endpoints
|
||||||
|
2. P0 [DONE]: Add CSRF/origin protection for state-changing requests
|
||||||
|
3. P0 [DONE]: Block unsafe URL schemes in rendered links
|
||||||
|
4. P0 [DONE]: Add rate limiting and request size limits
|
||||||
|
5. P1 [DONE]: Restrict `/api/settings` updates to allowlisted keys
|
||||||
|
6. P1: Add pagination + hard caps for list APIs
|
||||||
|
7. P1 [DONE]: Replace raw exception leakage with safe client errors
|
||||||
|
8. P1 [DONE]: Add automated tests for streaming/search/memory paths
|
||||||
|
9. P2 [DONE]: Implement MCP-style skills/tool-call framework
|
||||||
|
10. P2: Implement heartbeat/check-in scheduler + summary endpoint
|
||||||
|
|
||||||
|
Item 1 executive summary: keep guest mode for conversational chat, require 4-digit admin PIN for advanced/destructive actions, and enforce local/LAN-only backend policy by default.
|
||||||
|
|
||||||
|
Implementation status: complete (guest session by default + admin unlock + admin-only write enforcement + origin checks + safe-link sanitization + audit logging + rate/payload guardrails + capability tests).
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
|
||||||
|
1. ~~Verify SearXNG and Docker services persist across reboots~~
|
||||||
|
2. Conversation search/filter by keyword
|
||||||
|
3. Export conversation to markdown/text
|
||||||
|
4. Keyboard shortcuts (Ctrl+N new chat, Ctrl+Enter send)
|
||||||
|
5. Retry button on assistant messages
|
||||||
|
6. Source links — clickable links when search used
|
||||||
|
7. Allow conversation renaming
|
||||||
|
8. Multiple profiles — coding/sysadmin/general
|
||||||
|
9. Auto-generate conversation tags (client-side KWIC, top 5, filterable badges)
|
||||||
|
10. Image input support — pull vision model, file input/drag-drop, base64 encode, pass `images` array to Ollama `/api/chat`
|
||||||
|
11. Split-screen option for btop display
|
||||||
|
12. Skills as markdown files — `/opt/jarvischat/skills/`, YAML frontmatter + instructions, injected into context for tool calls
|
||||||
|
13. Heartbeats / proactive check-ins — cron + endpoint for daily briefings, HA anomaly alerts
|
||||||
|
14. Model info button — (i) icon next to Model dropdown, shows div with model description, last updated date, best-use purpose
|
||||||
|
15. Set default model — toggle any model as the default selection
|
||||||
|
16. Hide/remove model from list — exclude models from dropdown
|
||||||
|
17. Update model function — trigger `ollama pull` for selected model from UI
|
||||||
|
18. Add mouseover tooltip to SEND button
|
||||||
|
19. Add preflight validation for required model/preset selection and show a clear warning before send to prevent avoidable timeout loops
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
Browser ◄──► app.py (FastAPI) ◄──► Ollama (LLM)
|
/opt/jarvischat/
|
||||||
│
|
├── app.py # FastAPI backend
|
||||||
▼ (when uncertain)
|
├── jarvischat.db # SQLite database (auto-created)
|
||||||
SearXNG (web search)
|
├── static/
|
||||||
|
│ └── logo.png # Logo image (optional)
|
||||||
|
└── templates/
|
||||||
|
└── index.html # Frontend
|
||||||
```
|
```
|
||||||
|
|
||||||
JarvisChat acts as middleware between your browser and Ollama. When the model's perplexity exceeds a threshold (default 15.0) or it refuses to answer, JarvisChat automatically queries SearXNG, injects the results, and re-prompts the model.
|
|
||||||
|
|
||||||
**This is NOT training** — SearXNG is only used at runtime as a fallback for uncertain responses.
|
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
- Python 3.11+ (tested on 3.13)
|
- Python 3.11+ (tested on 3.13)
|
||||||
- Ollama running locally (default: `localhost:11434`)
|
- Ollama running locally or on network
|
||||||
- SearXNG (optional, for web search — default: `localhost:8888`)
|
- SearXNG (optional, for web search)
|
||||||
- ROCm (optional, for AMD GPU stats — `rocm-smi` must be in PATH)
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
```bash
|
### Fresh Install
|
||||||
# Clone or download app.py
|
|
||||||
git clone https://github.com/llamachileshop-code/313_webui.git
|
|
||||||
cd 313_webui
|
|
||||||
|
|
||||||
# Create virtual environment (recommended)
|
```bash
|
||||||
|
# Create directory and venv
|
||||||
|
sudo mkdir -p /opt/jarvischat
|
||||||
|
sudo chown $USER:$USER /opt/jarvischat
|
||||||
|
cd /opt/jarvischat
|
||||||
python3 -m venv venv
|
python3 -m venv venv
|
||||||
source venv/bin/activate
|
|
||||||
|
|
||||||
# Install dependencies
|
# Install dependencies
|
||||||
pip install fastapi httpx uvicorn psutil
|
./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart
|
||||||
|
|
||||||
# Run
|
# Set admin PIN before first startup (4 digits)
|
||||||
python app.py
|
export JARVISCHAT_ADMIN_PIN=4827
|
||||||
# or
|
|
||||||
uvicorn app:app --host 0.0.0.0 --port 8080
|
# Create subdirectories
|
||||||
|
mkdir -p templates static
|
||||||
|
|
||||||
|
# Copy files
|
||||||
|
# (copy app.py to /opt/jarvischat/)
|
||||||
|
# (copy index.html to /opt/jarvischat/templates/)
|
||||||
|
# (copy logo.png to /opt/jarvischat/static/ — optional)
|
||||||
```
|
```
|
||||||
|
|
||||||
Open `http://localhost:8080` in your browser.
|
WARNING: Do not use `1234` as your admin PIN unless you accept weak local security.
|
||||||
|
|
||||||
|
NOTE: First boot now requires `JARVISCHAT_ADMIN_PIN` unless you explicitly opt into insecure fallback with `JARVISCHAT_ALLOW_DEFAULT_PIN=true`.
|
||||||
|
|
||||||
|
### Upgrading from v1.4.x
|
||||||
|
|
||||||
**Note:** If running as a systemd service with a venv, install dependencies using the venv pip directly:
|
|
||||||
```bash
|
```bash
|
||||||
/opt/jarvischat/venv/bin/pip install fastapi httpx uvicorn psutil
|
cd /opt/jarvischat
|
||||||
|
|
||||||
|
# Backup
|
||||||
|
cp app.py app.py.bak
|
||||||
|
cp templates/index.html templates/index.html.bak
|
||||||
|
|
||||||
|
# Copy new files
|
||||||
|
# (copy app.py, replacing old version)
|
||||||
|
# (copy index.html to templates/)
|
||||||
|
|
||||||
|
# Restart
|
||||||
|
sudo systemctl restart jarvischat
|
||||||
```
|
```
|
||||||
|
|
||||||
## Running as a Service
|
## Systemd Service
|
||||||
|
|
||||||
**Important:** Although JarvisChat is a single-file Python application, it's designed to run as a persistent service alongside Ollama — not as a one-off script. Both services should start on boot.
|
|
||||||
|
|
||||||
### systemd Service (recommended)
|
|
||||||
|
|
||||||
Create `/etc/systemd/system/jarvischat.service`:
|
Create `/etc/systemd/system/jarvischat.service`:
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=JarvisChat - Ollama Web UI
|
Description=JarvisChat - Local Ollama Web Interface
|
||||||
After=network.target ollama.service
|
After=network.target
|
||||||
Wants=ollama.service
|
|
||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=simple
|
Type=simple
|
||||||
User=your-username
|
User=jarvischat
|
||||||
WorkingDirectory=/path/to/313_webui
|
Group=jarvischat
|
||||||
ExecStart=/usr/bin/python3 app.py
|
WorkingDirectory=/opt/jarvischat
|
||||||
Restart=on-failure
|
ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
|
||||||
|
Restart=always
|
||||||
RestartSec=5
|
RestartSec=5
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
```
|
```
|
||||||
|
|
||||||
Then enable and start:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo systemctl daemon-reload
|
sudo systemctl daemon-reload
|
||||||
sudo systemctl enable jarvischat
|
sudo systemctl enable jarvischat
|
||||||
sudo systemctl start jarvischat
|
sudo systemctl start jarvischat
|
||||||
```
|
```
|
||||||
|
|
||||||
### Verify Both Services
|
## Memory Commands
|
||||||
|
|
||||||
```bash
|
In chat, natural language triggers memory operations:
|
||||||
# Check Ollama
|
|
||||||
systemctl status ollama
|
|
||||||
|
|
||||||
# Check JarvisChat
|
| You say | What happens |
|
||||||
systemctl status jarvischat
|
|---------|--------------|
|
||||||
|
| "remember that I prefer Rust over Go" | Stores as `preference` |
|
||||||
|
| "remember that JarvisChat runs on port 8080" | Stores as `infrastructure` |
|
||||||
|
| "note that the deadline is Friday" | Stores as `general` |
|
||||||
|
| "forget about the deadline" | Removes matching memories |
|
||||||
|
|
||||||
# View JarvisChat logs
|
Memories are automatically searched based on your message content and injected into the system prompt when relevant.
|
||||||
journalctl -t jarvischat -f
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration
|
### Memory Topics
|
||||||
|
|
||||||
Edit these constants at the top of `app.py`:
|
Memories are auto-categorized:
|
||||||
|
- `preference` — likes, dislikes, choices
|
||||||
```python
|
- `project` — active work, repos, tasks
|
||||||
VERSION = "1.3.1"
|
- `infrastructure` — servers, services, configs
|
||||||
OLLAMA_BASE = "http://localhost:11434"
|
- `personal` — name, location, background
|
||||||
SEARXNG_BASE = "http://localhost:8888"
|
- `general` — everything else
|
||||||
DEFAULT_MODEL = "deepseek-coder:6.7b"
|
|
||||||
PERPLEXITY_THRESHOLD = 15.0 # Higher = less likely to trigger search
|
|
||||||
```
|
|
||||||
|
|
||||||
## Database
|
|
||||||
|
|
||||||
JarvisChat uses SQLite (`jarvischat.db` in the same directory as `app.py`):
|
|
||||||
|
|
||||||
| Table | Purpose |
|
|
||||||
|-------|---------|
|
|
||||||
| conversations | Chat sessions with model and timestamps |
|
|
||||||
| messages | Individual messages with role and content |
|
|
||||||
| system_presets | Saved system prompt presets |
|
|
||||||
| profile | Your persistent memory/context |
|
|
||||||
| settings | App settings (search/profile toggles, default model) |
|
|
||||||
|
|
||||||
## Logging
|
|
||||||
|
|
||||||
JarvisChat logs to syslog via journald:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Follow live logs
|
|
||||||
journalctl -t jarvischat -f
|
|
||||||
|
|
||||||
# View last 100 entries
|
|
||||||
journalctl -t jarvischat -n 100
|
|
||||||
```
|
|
||||||
|
|
||||||
## Token Thermometer
|
|
||||||
|
|
||||||
The vertical bar next to the input shows your context usage in real-time:
|
|
||||||
|
|
||||||
- **Green** — Plenty of room
|
|
||||||
- **Yellow** — 70%+ used
|
|
||||||
- **Red** — 90%+ used (approaching limit)
|
|
||||||
|
|
||||||
The count includes: profile + preset + conversation history + current input. Context size is fetched from Ollama when you switch models.
|
|
||||||
|
|
||||||
## Search Flow
|
|
||||||
|
|
||||||
1. User sends message → Ollama streams response with logprobs
|
|
||||||
2. JarvisChat calculates perplexity from logprobs
|
|
||||||
3. If perplexity > 15.0 OR refusal patterns detected:
|
|
||||||
- Yield `{searching: True}` to show spinner
|
|
||||||
- Query SearXNG (or wttr.in for weather)
|
|
||||||
- Inject results into context
|
|
||||||
- Re-prompt Ollama
|
|
||||||
4. If model still refuses, format raw search results directly
|
|
||||||
5. Clean hedging phrases from response
|
|
||||||
6. Yield final response with PPL and t/s badges
|
|
||||||
|
|
||||||
## API Endpoints
|
## API Endpoints
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
### Memory
|
||||||
|----------|--------|-------------|
|
|
||||||
| `/` | GET | Web UI |
|
|
||||||
| `/api/models` | GET | List Ollama models |
|
|
||||||
| `/api/ps` | GET | Running models |
|
|
||||||
| `/api/show` | POST | Model info (context size) |
|
|
||||||
| `/api/stats` | GET | System stats (CPU, memory, GPU, VRAM) |
|
|
||||||
| `/api/chat` | POST | Stream chat (SSE) |
|
|
||||||
| `/api/conversations` | GET/DELETE | List/delete all conversations |
|
|
||||||
| `/api/conversations/{id}` | GET/DELETE | Get/delete conversation |
|
|
||||||
| `/api/profile` | GET/PUT | Get/update profile |
|
|
||||||
| `/api/presets` | GET/POST | List/create presets |
|
|
||||||
| `/api/presets/{id}` | PUT/DELETE | Update/delete preset |
|
|
||||||
| `/api/settings` | GET/PUT | App settings |
|
|
||||||
| `/api/search/status` | GET | SearXNG availability |
|
|
||||||
|
|
||||||
## Screenshots
|
| Method | Endpoint | Description |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| GET | `/api/memories` | List all memories |
|
||||||
|
| POST | `/api/memories` | Add memory `{"fact": "...", "topic": "general"}` |
|
||||||
|
| DELETE | `/api/memories/{rowid}` | Delete memory by ID |
|
||||||
|
| GET | `/api/memories/search?q=term` | Search memories |
|
||||||
|
| GET | `/api/memories/stats` | Get counts by topic |
|
||||||
|
|
||||||
*(Add your own screenshot here)*
|
### Chat & Models
|
||||||
|
|
||||||
## TODO
|
| Method | Endpoint | Description |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| GET | `/api/models` | List available Ollama models |
|
||||||
|
| POST | `/api/chat` | Send message (streaming SSE) |
|
||||||
|
| POST | `/api/search` | Explicit web search (streaming SSE) |
|
||||||
|
| POST | `/api/show` | Get model info (context size) |
|
||||||
|
| GET | `/api/ps` | Get running models |
|
||||||
|
|
||||||
### Active
|
### Settings & Profile
|
||||||
|
|
||||||
1. ~~**Mass-delete conversation history**~~ ✓ (v1.3.0)
|
| Method | Endpoint | Description |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| GET | `/api/profile` | Get profile content |
|
||||||
|
| PUT | `/api/profile` | Update profile |
|
||||||
|
| GET | `/api/profile/default` | Get default profile |
|
||||||
|
| GET | `/api/settings` | Get settings |
|
||||||
|
| PUT | `/api/settings` | Update settings |
|
||||||
|
|
||||||
2. **Verify SearXNG and Docker services persist across reboots**
|
### Conversations
|
||||||
- Expand refusal patterns: "As an AI model", "based on my training data", "I don't have the capability"
|
|
||||||
|
|
||||||
3. **Input trigger: `search+` prefix**
|
| Method | Endpoint | Description |
|
||||||
- Strip prefix, query SearXNG directly, Ollama summarizes
|
|--------|----------|-------------|
|
||||||
- Raw results in expandable div (not tooltip)
|
| GET | `/api/conversations` | List conversations |
|
||||||
|
| GET | `/api/conversations/{id}` | Get conversation with messages |
|
||||||
|
| DELETE | `/api/conversations/{id}` | Delete conversation |
|
||||||
|
| DELETE | `/api/conversations` | Delete ALL conversations |
|
||||||
|
|
||||||
4. **Add `profile.example.md`**
|
### Presets
|
||||||
- Recommended default profile with anti-bullshit rules (no "As an AI", no OpenAI mentions)
|
|
||||||
|
|
||||||
### Backlog
|
| Method | Endpoint | Description |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| GET | `/api/presets` | List presets |
|
||||||
|
| POST | `/api/presets` | Create preset |
|
||||||
|
| PUT | `/api/presets/{id}` | Update preset |
|
||||||
|
| DELETE | `/api/presets/{id}` | Delete preset |
|
||||||
|
|
||||||
5. Conversation search/filter by keyword
|
### System
|
||||||
6. Export conversation to markdown/text
|
|
||||||
7. Keyboard shortcuts (Ctrl+N new chat, Ctrl+Enter send)
|
|
||||||
8. ~~Token count estimate before sending~~ ✓ (v1.2.9)
|
|
||||||
9. Model info display — context length, VRAM usage from Ollama `/api/ps`
|
|
||||||
10. Retry button on assistant messages
|
|
||||||
11. Source links — clickable links when search used
|
|
||||||
12. Allow conversation renaming
|
|
||||||
13. Multiple profiles — coding/sysadmin/general
|
|
||||||
14. Auto-generate conversation tags (client-side KWIC, top 5, filterable badges)
|
|
||||||
15. **Image input support**
|
|
||||||
- Pull vision model (llava, llama3.2-vision, etc.)
|
|
||||||
- Frontend: file input / drag-drop, base64 encode
|
|
||||||
- Backend: pass `images` array to Ollama `/api/chat`
|
|
||||||
|
|
||||||
## Version History
|
| Method | Endpoint | Description |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| GET | `/api/stats` | CPU, RAM, GPU, VRAM stats |
|
||||||
|
| GET | `/api/search/status` | SearXNG availability |
|
||||||
|
|
||||||
| Version | Changes |
|
## Configuration
|
||||||
|---------|---------|
|
|
||||||
| 1.3.1 | System stats panel (CPU, memory, GPU, VRAM) in sidebar |
|
Settings are stored in the `settings` table and include:
|
||||||
| 1.3.0 | Delete all conversations button |
|
|
||||||
| 1.2.9 | Token thermometer with live context tracking |
|
- `profile_enabled` — Inject profile into chats (true/false)
|
||||||
| 1.2.8 | Logo in sidebar, llama emoji tagline |
|
- `search_enabled` — Auto web search (true/false)
|
||||||
| 1.2.7 | Tokens per second (t/s) badge on responses |
|
- `memory_enabled` — Memory injection (true/false)
|
||||||
| 1.2.6 | wttr.in weather integration, improved search extraction |
|
- `default_model` — Default Ollama model
|
||||||
| 1.2.5 | SearXNG infoboxes/answers, smarter query building |
|
- `searxng_url` — SearXNG instance URL (default: `http://localhost:8888`)
|
||||||
| 1.2.4 | Perplexity badges, hedging cleanup |
|
|
||||||
| 1.2.3 | SearXNG integration with perplexity-based triggering |
|
## Testing Memory
|
||||||
| 1.2.0 | System prompt presets, settings persistence |
|
|
||||||
| 1.1.0 | Profile memory, model switching |
|
```bash
|
||||||
| 1.0.0 | Initial release |
|
# Add a memory via API
|
||||||
|
curl -X POST http://jarvis:8080/api/memories \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"fact": "User prefers native installs over Docker", "topic": "preference"}'
|
||||||
|
|
||||||
|
# Search memories
|
||||||
|
curl "http://jarvis:8080/api/memories/search?q=docker"
|
||||||
|
|
||||||
|
# List all memories
|
||||||
|
curl http://jarvis:8080/api/memories
|
||||||
|
|
||||||
|
# Get stats
|
||||||
|
curl http://jarvis:8080/api/memories/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
Or in chat:
|
||||||
|
1. Say "remember that I hate YAML"
|
||||||
|
2. Later ask "what markup languages should I avoid?"
|
||||||
|
3. JarvisChat will inject the YAML preference into context
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Service won't start
|
||||||
|
|
||||||
|
Check logs:
|
||||||
|
```bash
|
||||||
|
journalctl -u jarvischat -n 50 --no-pager
|
||||||
|
```
|
||||||
|
|
||||||
|
Common issues:
|
||||||
|
- Missing `jinja2`: `./venv/bin/pip install jinja2`
|
||||||
|
- Missing `templates/` directory
|
||||||
|
- Wrong permissions on `/opt/jarvischat`
|
||||||
|
|
||||||
|
### Memory not working
|
||||||
|
|
||||||
|
1. Check memory is enabled (🧠 MEM ON in topbar)
|
||||||
|
2. Verify memories exist: `curl http://jarvis:8080/api/memories`
|
||||||
|
3. Check FTS5 table: `sqlite3 jarvischat.db "SELECT * FROM memories_fts;"`
|
||||||
|
|
||||||
|
### Web search not working
|
||||||
|
|
||||||
|
1. Verify SearXNG is running: `curl http://localhost:8888/search?q=test&format=json`
|
||||||
|
2. Check search status: `curl http://jarvis:8080/api/search/status`
|
||||||
|
3. Ensure JSON format is enabled in SearXNG settings
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT
|
MIT
|
||||||
|
|
||||||
---
|
## Repository
|
||||||
|
|
||||||
## A Note from Gramps
|
Gitea: `ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git`
|
||||||
|
|
||||||
I named my AI machine "jarvis" after the AI assistant in *Iron Man* (2008) — because it's an awesome name. When I started building a local coding companion to talk to it, "JarvisChat" just made sense.
|
|
||||||
|
|
||||||
This project is in active development. Eventually it'll get packaged up as a Docker thing, but for now while I'm iterating fast, a single-file Python service does the job.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
*Built with 🦙 by Gramps at the Llama Chile Shop*
|
|
||||||
|
|||||||
0
routers/__init__.py
Normal file
0
routers/__init__.py
Normal file
203
routers/chat.py
Normal file
203
routers/chat.py
Normal file
@@ -0,0 +1,203 @@
|
|||||||
|
"""JarvisChat routers - /api/chat streaming endpoint."""
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from fastapi.responses import StreamingResponse
|
||||||
|
|
||||||
|
from config import DEFAULT_MODEL, LLAMA_SERVER_BASE
|
||||||
|
from db import get_db
|
||||||
|
from memory import process_remember_command
|
||||||
|
from rag import build_system_prompt
|
||||||
|
from search import (calculate_perplexity, is_uncertain, is_refusal,
|
||||||
|
clean_hedging, format_search_results, format_direct_answer,
|
||||||
|
extract_search_query, query_searxng)
|
||||||
|
from security import read_json_body, log_incident, BODY_LIMIT_CHAT_BYTES
|
||||||
|
from config import MAX_CHAT_MESSAGE_CHARS
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
def parse_llama_stream_chunk(line: str) -> tuple:
|
||||||
|
if line.startswith("data: "):
|
||||||
|
line = line[6:]
|
||||||
|
if line.strip() == "[DONE]":
|
||||||
|
return None, True, {}
|
||||||
|
try:
|
||||||
|
chunk = json.loads(line)
|
||||||
|
choices = chunk.get("choices", [])
|
||||||
|
if choices:
|
||||||
|
delta = choices[0].get("delta", {})
|
||||||
|
token = delta.get("content")
|
||||||
|
finish = choices[0].get("finish_reason")
|
||||||
|
stats = {}
|
||||||
|
if finish == "stop":
|
||||||
|
usage = chunk.get("usage", {})
|
||||||
|
stats["tokens_per_sec"] = usage.get("tokens_per_second", 0.0)
|
||||||
|
return token, finish == "stop", stats
|
||||||
|
if "message" in chunk and "content" in chunk["message"]:
|
||||||
|
token = chunk["message"]["content"]
|
||||||
|
done = chunk.get("done", False)
|
||||||
|
stats = {}
|
||||||
|
if done:
|
||||||
|
eval_count = chunk.get("eval_count", 0)
|
||||||
|
eval_duration = chunk.get("eval_duration", 0)
|
||||||
|
stats["tokens_per_sec"] = (eval_count / (eval_duration / 1e9)) if eval_duration > 0 else 0
|
||||||
|
return token, done, stats
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
pass
|
||||||
|
return None, False, {}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/chat")
|
||||||
|
async def chat(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_CHAT_BYTES)
|
||||||
|
conv_id = body.get("conversation_id")
|
||||||
|
user_message = body.get("message", "").strip()
|
||||||
|
if len(user_message) > MAX_CHAT_MESSAGE_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Chat message is too long")
|
||||||
|
model = body.get("model", DEFAULT_MODEL)
|
||||||
|
preset_prompt = body.get("system_prompt", "")
|
||||||
|
|
||||||
|
if not user_message:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty message")
|
||||||
|
|
||||||
|
db = get_db()
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
settings = {row["key"]: row["value"] for row in db.execute("SELECT key, value FROM settings").fetchall()}
|
||||||
|
search_enabled = settings.get("search_enabled", "true") == "true"
|
||||||
|
|
||||||
|
remember_response = process_remember_command(user_message)
|
||||||
|
|
||||||
|
if not conv_id:
|
||||||
|
conv_id = str(uuid.uuid4())
|
||||||
|
title = user_message[:80] + ("..." if len(user_message) > 80 else "")
|
||||||
|
db.execute("INSERT INTO conversations (id, title, model, created_at, updated_at) VALUES (?, ?, ?, ?, ?)",
|
||||||
|
(conv_id, title, model, now, now))
|
||||||
|
else:
|
||||||
|
db.execute("UPDATE conversations SET updated_at = ? WHERE id = ?", (now, conv_id))
|
||||||
|
|
||||||
|
db.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "user", user_message, now))
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
history_rows = db.execute(
|
||||||
|
"SELECT role, content FROM messages WHERE conversation_id = ? ORDER BY id ASC", (conv_id,)
|
||||||
|
).fetchall()
|
||||||
|
system_prompt = await build_system_prompt(db, preset_prompt, user_message)
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
messages = []
|
||||||
|
if system_prompt:
|
||||||
|
messages.append({"role": "system", "content": system_prompt})
|
||||||
|
for row in history_rows:
|
||||||
|
messages.append({"role": row["role"], "content": row["content"]})
|
||||||
|
|
||||||
|
ollama_payload = {"model": model, "messages": messages, "stream": True}
|
||||||
|
|
||||||
|
async def stream_response():
|
||||||
|
full_response = []
|
||||||
|
all_logprobs = []
|
||||||
|
tokens_per_sec = 0.0
|
||||||
|
|
||||||
|
if remember_response:
|
||||||
|
yield f"data: {json.dumps({'token': remember_response + chr(10) + chr(10), 'conversation_id': conv_id})}\n\n"
|
||||||
|
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
async with client.stream(
|
||||||
|
"POST", f"{LLAMA_SERVER_BASE}/v1/chat/completions",
|
||||||
|
json=ollama_payload,
|
||||||
|
timeout=httpx.Timeout(300.0, connect=10.0),
|
||||||
|
) as resp:
|
||||||
|
async for line in resp.aiter_lines():
|
||||||
|
if line.strip():
|
||||||
|
token, done, stats = parse_llama_stream_chunk(line)
|
||||||
|
if token:
|
||||||
|
full_response.append(token)
|
||||||
|
yield f"data: {json.dumps({'token': token, 'conversation_id': conv_id})}\n\n"
|
||||||
|
if done:
|
||||||
|
tokens_per_sec = stats.get("tokens_per_sec", 0.0)
|
||||||
|
|
||||||
|
assistant_msg = "".join(full_response)
|
||||||
|
perplexity = calculate_perplexity(all_logprobs) if all_logprobs else 0.0
|
||||||
|
should_search = is_uncertain(all_logprobs) or is_refusal(assistant_msg)
|
||||||
|
|
||||||
|
if search_enabled and should_search:
|
||||||
|
yield f"data: {json.dumps({'searching': True, 'conversation_id': conv_id})}\n\n"
|
||||||
|
search_query = extract_search_query(user_message)
|
||||||
|
search_results = await query_searxng(search_query)
|
||||||
|
|
||||||
|
if search_results:
|
||||||
|
search_context = format_search_results(search_results)
|
||||||
|
augmented_messages = []
|
||||||
|
if system_prompt:
|
||||||
|
augmented_messages.append({"role": "system", "content": system_prompt + "\n\n" + search_context})
|
||||||
|
else:
|
||||||
|
augmented_messages.append({"role": "system", "content": search_context})
|
||||||
|
for row in history_rows[:-1]:
|
||||||
|
augmented_messages.append({"role": row["role"], "content": row["content"]})
|
||||||
|
augmented_messages.append({"role": "user", "content": user_message})
|
||||||
|
|
||||||
|
yield f"data: {json.dumps({'search_results': len(search_results), 'conversation_id': conv_id})}\n\n"
|
||||||
|
|
||||||
|
augmented_response = []
|
||||||
|
async with client.stream(
|
||||||
|
"POST", f"{LLAMA_SERVER_BASE}/v1/chat/completions",
|
||||||
|
json={"model": model, "messages": augmented_messages, "stream": True},
|
||||||
|
timeout=httpx.Timeout(300.0, connect=10.0),
|
||||||
|
) as resp2:
|
||||||
|
async for line in resp2.aiter_lines():
|
||||||
|
if line.strip():
|
||||||
|
token2, done2, _ = parse_llama_stream_chunk(line)
|
||||||
|
if token2:
|
||||||
|
augmented_response.append(token2)
|
||||||
|
if done2:
|
||||||
|
break
|
||||||
|
|
||||||
|
raw_response = "".join(augmented_response) or assistant_msg
|
||||||
|
cleaned_response = clean_hedging(raw_response)
|
||||||
|
if is_refusal(cleaned_response) or len(cleaned_response) < 20:
|
||||||
|
cleaned_response = format_direct_answer(user_message, search_results)
|
||||||
|
|
||||||
|
yield f"data: {json.dumps({'token': cleaned_response, 'conversation_id': conv_id, 'augmented': True})}\n\n"
|
||||||
|
|
||||||
|
saved_msg = cleaned_response + "\n\n---\n*🔍 Enhanced with web search results*"
|
||||||
|
if remember_response:
|
||||||
|
saved_msg = remember_response + "\n\n" + saved_msg
|
||||||
|
|
||||||
|
db2 = get_db()
|
||||||
|
db2.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "assistant", saved_msg, datetime.now(timezone.utc).isoformat()))
|
||||||
|
db2.commit()
|
||||||
|
db2.close()
|
||||||
|
|
||||||
|
yield f"data: {json.dumps({'done': True, 'conversation_id': conv_id, 'searched': True, 'perplexity': round(perplexity, 2), 'tokens_per_sec': round(tokens_per_sec, 1)})}\n\n"
|
||||||
|
return
|
||||||
|
|
||||||
|
saved_msg = assistant_msg
|
||||||
|
if remember_response:
|
||||||
|
saved_msg = remember_response + "\n\n" + saved_msg
|
||||||
|
|
||||||
|
db2 = get_db()
|
||||||
|
db2.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "assistant", saved_msg, datetime.now(timezone.utc).isoformat()))
|
||||||
|
db2.commit()
|
||||||
|
db2.close()
|
||||||
|
|
||||||
|
yield f"data: {json.dumps({'done': True, 'conversation_id': conv_id, 'perplexity': round(perplexity, 2), 'tokens_per_sec': round(tokens_per_sec, 1)})}\n\n"
|
||||||
|
|
||||||
|
except httpx.RemoteProtocolError:
|
||||||
|
pass
|
||||||
|
except httpx.ConnectError:
|
||||||
|
yield f"data: {json.dumps({'error': 'Cannot connect to Ollama. Is it running?'})}\n\n"
|
||||||
|
except Exception as e:
|
||||||
|
incident_key = log_incident("chat_stream", message="Ollama stream failure during chat response",
|
||||||
|
request=request, exc=e)
|
||||||
|
yield f"data: {json.dumps({'error': 'Chat response generation failed before completion. Use the incident key for support lookup.', 'error_key': incident_key})}\n\n"
|
||||||
|
|
||||||
|
return StreamingResponse(stream_response(), media_type="text/event-stream")
|
||||||
267
routers/completions.py
Normal file
267
routers/completions.py
Normal file
@@ -0,0 +1,267 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - /v1/chat/completions router.
|
||||||
|
OpenAI-compatible endpoint for IDE integration (Continue.dev, etc.).
|
||||||
|
Runs all requests through the full jC pipeline: profile + RAG + memory injection.
|
||||||
|
FIM (fill-in-the-middle) requests are proxied directly — not persisted.
|
||||||
|
Chat-style requests are persisted to conversation history.
|
||||||
|
Auth: static Bearer token via COMPLETIONS_API_KEY in config.
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from fastapi.responses import StreamingResponse, JSONResponse
|
||||||
|
|
||||||
|
from config import DEFAULT_MODEL, LLAMA_SERVER_BASE, COMPLETIONS_API_KEY
|
||||||
|
from db import get_db
|
||||||
|
from rag import build_system_prompt
|
||||||
|
from routers.chat import parse_llama_stream_chunk
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
def _check_api_key(request: Request):
|
||||||
|
auth = request.headers.get("Authorization", "")
|
||||||
|
if not auth.startswith("Bearer "):
|
||||||
|
raise HTTPException(status_code=401, detail="Missing Bearer token")
|
||||||
|
token = auth[7:].strip()
|
||||||
|
if token != COMPLETIONS_API_KEY:
|
||||||
|
raise HTTPException(status_code=401, detail="Invalid API key")
|
||||||
|
|
||||||
|
|
||||||
|
def _is_fim_request(body: dict) -> bool:
|
||||||
|
"""
|
||||||
|
FIM (fill-in-the-middle) requests use a 'prompt' + optional 'suffix' structure
|
||||||
|
rather than a 'messages' array. Continue.dev sends these for inline autocomplete.
|
||||||
|
We proxy them directly without pipeline injection or persistence.
|
||||||
|
"""
|
||||||
|
return "prompt" in body and "messages" not in body
|
||||||
|
|
||||||
|
|
||||||
|
def _build_openai_chunk(token: str, model: str, conv_id: str) -> str:
|
||||||
|
chunk = {
|
||||||
|
"id": f"chatcmpl-{conv_id}",
|
||||||
|
"object": "chat.completion.chunk",
|
||||||
|
"model": model,
|
||||||
|
"choices": [{
|
||||||
|
"index": 0,
|
||||||
|
"delta": {"content": token},
|
||||||
|
"finish_reason": None,
|
||||||
|
}],
|
||||||
|
}
|
||||||
|
return f"data: {json.dumps(chunk)}\n\n"
|
||||||
|
|
||||||
|
|
||||||
|
def _build_openai_stop_chunk(model: str, conv_id: str) -> str:
|
||||||
|
chunk = {
|
||||||
|
"id": f"chatcmpl-{conv_id}",
|
||||||
|
"object": "chat.completion.chunk",
|
||||||
|
"model": model,
|
||||||
|
"choices": [{
|
||||||
|
"index": 0,
|
||||||
|
"delta": {},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}],
|
||||||
|
}
|
||||||
|
return f"data: {json.dumps(chunk)}\n\n"
|
||||||
|
|
||||||
|
|
||||||
|
def _build_openai_response(content: str, model: str, conv_id: str) -> dict:
|
||||||
|
"""Non-streaming response envelope."""
|
||||||
|
return {
|
||||||
|
"id": f"chatcmpl-{conv_id}",
|
||||||
|
"object": "chat.completion",
|
||||||
|
"model": model,
|
||||||
|
"choices": [{
|
||||||
|
"index": 0,
|
||||||
|
"message": {"role": "assistant", "content": content},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}],
|
||||||
|
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/v1/chat/completions")
|
||||||
|
async def chat_completions(request: Request):
|
||||||
|
_check_api_key(request)
|
||||||
|
|
||||||
|
try:
|
||||||
|
body = await request.json()
|
||||||
|
except Exception:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid JSON body")
|
||||||
|
|
||||||
|
# --- FIM passthrough ---
|
||||||
|
if _is_fim_request(body):
|
||||||
|
return await _fim_passthrough(body)
|
||||||
|
|
||||||
|
# --- Chat completion ---
|
||||||
|
messages = body.get("messages", [])
|
||||||
|
if not messages:
|
||||||
|
raise HTTPException(status_code=400, detail="No messages provided")
|
||||||
|
|
||||||
|
model = body.get("model", DEFAULT_MODEL)
|
||||||
|
stream = body.get("stream", True)
|
||||||
|
|
||||||
|
# Extract the latest user message for RAG + conversation title
|
||||||
|
user_message = ""
|
||||||
|
for msg in reversed(messages):
|
||||||
|
if msg.get("role") == "user":
|
||||||
|
user_message = msg.get("content", "").strip()
|
||||||
|
break
|
||||||
|
|
||||||
|
if not user_message:
|
||||||
|
raise HTTPException(status_code=400, detail="No user message found")
|
||||||
|
|
||||||
|
# --- Persist conversation ---
|
||||||
|
db = get_db()
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
conv_id = str(uuid.uuid4())
|
||||||
|
title = f"[IDE] {user_message[:72]}{'...' if len(user_message) > 72 else ''}"
|
||||||
|
db.execute(
|
||||||
|
"INSERT INTO conversations (id, title, model, created_at, updated_at) VALUES (?, ?, ?, ?, ?)",
|
||||||
|
(conv_id, title, model, now, now),
|
||||||
|
)
|
||||||
|
for msg in messages:
|
||||||
|
role = msg.get("role")
|
||||||
|
content = msg.get("content", "")
|
||||||
|
if role in ("user", "assistant"):
|
||||||
|
db.execute(
|
||||||
|
"INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, role, content, now),
|
||||||
|
)
|
||||||
|
db.commit()
|
||||||
|
|
||||||
|
# --- Build system prompt through full jC pipeline ---
|
||||||
|
system_prompt = await build_system_prompt(db, "", user_message)
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
# Assemble messages for upstream: inject jC system prompt, preserve history
|
||||||
|
upstream_messages = []
|
||||||
|
if system_prompt:
|
||||||
|
upstream_messages.append({"role": "system", "content": system_prompt})
|
||||||
|
|
||||||
|
# Strip any system messages from the incoming payload — jC owns the system prompt
|
||||||
|
for msg in messages:
|
||||||
|
if msg.get("role") != "system":
|
||||||
|
upstream_messages.append(msg)
|
||||||
|
|
||||||
|
upstream_payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": upstream_messages,
|
||||||
|
"stream": True, # always stream from upstream; we buffer if client wants non-stream
|
||||||
|
}
|
||||||
|
|
||||||
|
if stream:
|
||||||
|
return StreamingResponse(
|
||||||
|
_stream_chat(upstream_payload, model, conv_id, request),
|
||||||
|
media_type="text/event-stream",
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
return await _blocking_chat(upstream_payload, model, conv_id, request)
|
||||||
|
|
||||||
|
|
||||||
|
async def _stream_chat(payload: dict, model: str, conv_id: str, request: Request):
|
||||||
|
"""Stream tokens to client in OpenAI SSE format, persist assistant response."""
|
||||||
|
full_response = []
|
||||||
|
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
async with client.stream(
|
||||||
|
"POST", f"{LLAMA_SERVER_BASE}/v1/chat/completions",
|
||||||
|
json=payload,
|
||||||
|
timeout=httpx.Timeout(300.0, connect=10.0),
|
||||||
|
) as resp:
|
||||||
|
async for line in resp.aiter_lines():
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
token, done, _ = parse_llama_stream_chunk(line)
|
||||||
|
if token:
|
||||||
|
full_response.append(token)
|
||||||
|
yield _build_openai_chunk(token, model, conv_id)
|
||||||
|
if done:
|
||||||
|
break
|
||||||
|
|
||||||
|
yield _build_openai_stop_chunk(model, conv_id)
|
||||||
|
yield "data: [DONE]\n\n"
|
||||||
|
|
||||||
|
# Persist assistant response
|
||||||
|
assistant_msg = "".join(full_response)
|
||||||
|
if assistant_msg:
|
||||||
|
db = get_db()
|
||||||
|
db.execute(
|
||||||
|
"INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "assistant", assistant_msg, datetime.now(timezone.utc).isoformat()),
|
||||||
|
)
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
except httpx.ConnectError:
|
||||||
|
err = {"error": {"message": "Cannot connect to inference server", "type": "connection_error"}}
|
||||||
|
yield f"data: {json.dumps(err)}\n\n"
|
||||||
|
except Exception as e:
|
||||||
|
log.error(f"completions stream error: {e}")
|
||||||
|
err = {"error": {"message": "Stream failed", "type": "server_error"}}
|
||||||
|
yield f"data: {json.dumps(err)}\n\n"
|
||||||
|
|
||||||
|
|
||||||
|
async def _blocking_chat(payload: dict, model: str, conv_id: str, request: Request) -> JSONResponse:
|
||||||
|
"""Accumulate full response, return as standard OpenAI JSON object."""
|
||||||
|
full_response = []
|
||||||
|
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
async with client.stream(
|
||||||
|
"POST", f"{LLAMA_SERVER_BASE}/v1/chat/completions",
|
||||||
|
json=payload,
|
||||||
|
timeout=httpx.Timeout(300.0, connect=10.0),
|
||||||
|
) as resp:
|
||||||
|
async for line in resp.aiter_lines():
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
token, done, _ = parse_llama_stream_chunk(line)
|
||||||
|
if token:
|
||||||
|
full_response.append(token)
|
||||||
|
if done:
|
||||||
|
break
|
||||||
|
except httpx.ConnectError:
|
||||||
|
raise HTTPException(status_code=503, detail="Cannot connect to inference server")
|
||||||
|
except Exception as e:
|
||||||
|
log.error(f"completions blocking error: {e}")
|
||||||
|
raise HTTPException(status_code=500, detail="Inference request failed")
|
||||||
|
|
||||||
|
assistant_msg = "".join(full_response)
|
||||||
|
|
||||||
|
if assistant_msg:
|
||||||
|
db = get_db()
|
||||||
|
db.execute(
|
||||||
|
"INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "assistant", assistant_msg, datetime.now(timezone.utc).isoformat()),
|
||||||
|
)
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
return JSONResponse(content=_build_openai_response(assistant_msg, model, conv_id))
|
||||||
|
|
||||||
|
|
||||||
|
async def _fim_passthrough(body: dict) -> JSONResponse:
|
||||||
|
"""
|
||||||
|
Proxy FIM requests directly to llama-server without pipeline injection.
|
||||||
|
Not persisted — autocomplete noise has no RAG value.
|
||||||
|
"""
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
resp = await client.post(
|
||||||
|
f"{LLAMA_SERVER_BASE}/v1/completions",
|
||||||
|
json=body,
|
||||||
|
timeout=httpx.Timeout(30.0, connect=5.0),
|
||||||
|
)
|
||||||
|
return JSONResponse(content=resp.json(), status_code=resp.status_code)
|
||||||
|
except httpx.ConnectError:
|
||||||
|
raise HTTPException(status_code=503, detail="Cannot connect to inference server")
|
||||||
|
except Exception as e:
|
||||||
|
log.error(f"FIM passthrough error: {e}")
|
||||||
|
raise HTTPException(status_code=500, detail="FIM request failed")
|
||||||
83
routers/conversations.py
Normal file
83
routers/conversations.py
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
"""JarvisChat routers - Conversation CRUD."""
|
||||||
|
import logging
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from db import get_db
|
||||||
|
from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
from config import DEFAULT_MODEL, MAX_CONVERSATION_TITLE_CHARS
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/conversations")
|
||||||
|
async def list_conversations():
|
||||||
|
db = get_db()
|
||||||
|
rows = db.execute("SELECT * FROM conversations ORDER BY updated_at DESC").fetchall()
|
||||||
|
db.close()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/conversations")
|
||||||
|
async def create_conversation(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
conv_id = str(uuid.uuid4())
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
model = body.get("model", DEFAULT_MODEL)
|
||||||
|
title = str(body.get("title", "New Chat"))[:MAX_CONVERSATION_TITLE_CHARS]
|
||||||
|
db = get_db()
|
||||||
|
db.execute("INSERT INTO conversations (id, title, model, created_at, updated_at) VALUES (?, ?, ?, ?, ?)",
|
||||||
|
(conv_id, title, model, now, now))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"id": conv_id, "title": title, "model": model, "created_at": now, "updated_at": now}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/conversations/{conv_id}")
|
||||||
|
async def get_conversation(conv_id: str):
|
||||||
|
db = get_db()
|
||||||
|
conv = db.execute("SELECT * FROM conversations WHERE id = ?", (conv_id,)).fetchone()
|
||||||
|
if not conv:
|
||||||
|
db.close()
|
||||||
|
raise HTTPException(status_code=404, detail="Conversation not found")
|
||||||
|
messages = db.execute("SELECT * FROM messages WHERE conversation_id = ? ORDER BY id ASC", (conv_id,)).fetchall()
|
||||||
|
db.close()
|
||||||
|
return {"conversation": dict(conv), "messages": [dict(m) for m in messages]}
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/api/conversations/{conv_id}")
|
||||||
|
async def update_conversation(conv_id: str, request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
db = get_db()
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
if "title" in body:
|
||||||
|
db.execute("UPDATE conversations SET title = ?, updated_at = ? WHERE id = ?",
|
||||||
|
(str(body["title"])[:MAX_CONVERSATION_TITLE_CHARS], now, conv_id))
|
||||||
|
if "model" in body:
|
||||||
|
db.execute("UPDATE conversations SET model = ?, updated_at = ? WHERE id = ?",
|
||||||
|
(body["model"], now, conv_id))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/api/conversations/{conv_id}")
|
||||||
|
async def delete_conversation(conv_id: str):
|
||||||
|
db = get_db()
|
||||||
|
db.execute("DELETE FROM messages WHERE conversation_id = ?", (conv_id,))
|
||||||
|
db.execute("DELETE FROM conversations WHERE id = ?", (conv_id,))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/api/conversations")
|
||||||
|
async def delete_all_conversations():
|
||||||
|
db = get_db()
|
||||||
|
db.execute("DELETE FROM messages")
|
||||||
|
db.execute("DELETE FROM conversations")
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
log.info("Deleted all conversations")
|
||||||
|
return {"status": "ok"}
|
||||||
63
routers/memories.py
Normal file
63
routers/memories.py
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
"""JarvisChat routers - Memory CRUD API."""
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from db import get_db
|
||||||
|
from memory import add_memory, delete_memory, update_memory, get_all_memories, search_memories
|
||||||
|
from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
from config import MAX_MEMORY_FACT_CHARS
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/memories")
|
||||||
|
async def list_memories(topic: Optional[str] = None):
|
||||||
|
memories = get_all_memories(topic)
|
||||||
|
return {"memories": memories, "count": len(memories)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/memories")
|
||||||
|
async def create_memory(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
fact = str(body.get("fact", "")).strip()
|
||||||
|
if not fact:
|
||||||
|
raise HTTPException(status_code=400, detail="Memory fact is required")
|
||||||
|
if len(fact) > MAX_MEMORY_FACT_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Memory fact is too long")
|
||||||
|
rowid = add_memory(fact=fact, topic=body.get("topic", "general"), source=body.get("source", "manual"))
|
||||||
|
return {"rowid": rowid, "status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/api/memories/{rowid}")
|
||||||
|
async def remove_memory(rowid: int):
|
||||||
|
if not delete_memory(rowid):
|
||||||
|
raise HTTPException(status_code=404, detail="Memory not found")
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/api/memories/{rowid}")
|
||||||
|
async def edit_memory(rowid: int, request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
fact = str(body.get("fact", "")).strip()
|
||||||
|
if not fact:
|
||||||
|
raise HTTPException(status_code=400, detail="Memory fact is required")
|
||||||
|
if len(fact) > MAX_MEMORY_FACT_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Memory fact is too long")
|
||||||
|
if not update_memory(rowid, fact):
|
||||||
|
raise HTTPException(status_code=404, detail="Memory not found")
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/memories/search")
|
||||||
|
async def search_memories_api(q: str, limit: int = 10):
|
||||||
|
results = search_memories(q, limit=limit)
|
||||||
|
return {"results": results, "count": len(results)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/memories/stats")
|
||||||
|
async def memory_stats():
|
||||||
|
db = get_db()
|
||||||
|
total = db.execute("SELECT COUNT(*) as c FROM memories").fetchone()["c"]
|
||||||
|
topics = db.execute("SELECT topic, COUNT(*) as c FROM memories GROUP BY topic ORDER BY c DESC").fetchall()
|
||||||
|
db.close()
|
||||||
|
return {"total": total, "by_topic": {row["topic"]: row["c"] for row in topics}}
|
||||||
78
routers/models.py
Normal file
78
routers/models.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat routers - Model listing, system stats.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
import psutil
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
|
||||||
|
from config import OLLAMA_BASE
|
||||||
|
from gpu import get_gpu_stats
|
||||||
|
from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/models")
|
||||||
|
async def list_models():
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
resp = await client.get(f"{OLLAMA_BASE}/v1/models", timeout=10)
|
||||||
|
data = resp.json()
|
||||||
|
models = [{"name": m["id"], "model": m["id"]} for m in data.get("data", [])]
|
||||||
|
return {"models": models}
|
||||||
|
except httpx.ConnectError:
|
||||||
|
raise HTTPException(status_code=502, detail="Cannot connect to llama-server.")
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/ps")
|
||||||
|
async def running_models():
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
resp = await client.get(f"{OLLAMA_BASE}/api/ps", timeout=10)
|
||||||
|
return resp.json()
|
||||||
|
except httpx.ConnectError:
|
||||||
|
raise HTTPException(status_code=502, detail="Cannot connect to Ollama.")
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/show")
|
||||||
|
async def show_model(request: Request):
|
||||||
|
from security import BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
resp = await client.post(f"{OLLAMA_BASE}/api/show", json=body, timeout=10)
|
||||||
|
return resp.json()
|
||||||
|
except httpx.ConnectError:
|
||||||
|
raise HTTPException(status_code=502, detail="Cannot connect to Ollama.")
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/stats")
|
||||||
|
async def system_stats():
|
||||||
|
cpu_percent = psutil.cpu_percent(interval=0.1)
|
||||||
|
memory = psutil.virtual_memory()
|
||||||
|
gpu = get_gpu_stats()
|
||||||
|
return {
|
||||||
|
"cpu_percent": round(cpu_percent, 1),
|
||||||
|
"memory_percent": round(memory.percent, 1),
|
||||||
|
"memory_used_gb": round(memory.used / (1024**3), 1),
|
||||||
|
"memory_total_gb": round(memory.total / (1024**3), 1),
|
||||||
|
"gpu_percent": gpu["gpu_percent"],
|
||||||
|
"vram_percent": gpu["vram_percent"],
|
||||||
|
"gpu_available": gpu["available"],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/search/status")
|
||||||
|
async def search_status():
|
||||||
|
from config import SEARXNG_BASE
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
resp = await client.get(f"{SEARXNG_BASE}/search",
|
||||||
|
params={"q": "test", "format": "json"}, timeout=5)
|
||||||
|
return {"available": resp.status_code == 200}
|
||||||
|
except Exception:
|
||||||
|
return {"available": False}
|
||||||
61
routers/presets.py
Normal file
61
routers/presets.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
"""JarvisChat routers - System prompt presets."""
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from db import get_db
|
||||||
|
from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
from config import MAX_PRESET_NAME_CHARS, MAX_PRESET_PROMPT_CHARS
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/presets")
|
||||||
|
async def list_presets():
|
||||||
|
db = get_db()
|
||||||
|
rows = db.execute("SELECT * FROM system_presets ORDER BY is_default DESC, name ASC").fetchall()
|
||||||
|
db.close()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/presets")
|
||||||
|
async def create_preset(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
name = str(body.get("name", "")).strip()
|
||||||
|
prompt = str(body.get("prompt", "")).strip()
|
||||||
|
if not name or not prompt:
|
||||||
|
raise HTTPException(status_code=400, detail="Preset name and prompt are required")
|
||||||
|
if len(name) > MAX_PRESET_NAME_CHARS or len(prompt) > MAX_PRESET_PROMPT_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Preset fields are too long")
|
||||||
|
preset_id = str(uuid.uuid4())
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
db = get_db()
|
||||||
|
db.execute("INSERT INTO system_presets (id, name, prompt, is_default, created_at) VALUES (?, ?, ?, 0, ?)",
|
||||||
|
(preset_id, name, prompt, now))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"id": preset_id, "name": name, "prompt": prompt}
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/api/presets/{preset_id}")
|
||||||
|
async def update_preset(preset_id: str, request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
name = str(body.get("name", "")).strip()
|
||||||
|
prompt = str(body.get("prompt", "")).strip()
|
||||||
|
if not name or not prompt:
|
||||||
|
raise HTTPException(status_code=400, detail="Preset name and prompt are required")
|
||||||
|
if len(name) > MAX_PRESET_NAME_CHARS or len(prompt) > MAX_PRESET_PROMPT_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Preset fields are too long")
|
||||||
|
db = get_db()
|
||||||
|
db.execute("UPDATE system_presets SET name = ?, prompt = ? WHERE id = ?", (name, prompt, preset_id))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/api/presets/{preset_id}")
|
||||||
|
async def delete_preset(preset_id: str):
|
||||||
|
db = get_db()
|
||||||
|
db.execute("DELETE FROM system_presets WHERE id = ? AND is_default = 0", (preset_id,))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"status": "ok"}
|
||||||
36
routers/profile.py
Normal file
36
routers/profile.py
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
"""JarvisChat routers - Profile."""
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from db import get_db
|
||||||
|
from security import read_json_body, BODY_LIMIT_PROFILE_BYTES
|
||||||
|
from config import MAX_PROFILE_CHARS, DEFAULT_PROFILE
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/profile")
|
||||||
|
async def get_profile():
|
||||||
|
db = get_db()
|
||||||
|
row = db.execute("SELECT content, updated_at FROM profile WHERE id = 1").fetchone()
|
||||||
|
db.close()
|
||||||
|
return ({"content": row["content"], "updated_at": row["updated_at"]} if row
|
||||||
|
else {"content": "", "updated_at": ""})
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/api/profile")
|
||||||
|
async def update_profile(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_PROFILE_BYTES)
|
||||||
|
content = str(body.get("content", ""))
|
||||||
|
if len(content) > MAX_PROFILE_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Profile content is too long")
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
db = get_db()
|
||||||
|
db.execute("UPDATE profile SET content = ?, updated_at = ? WHERE id = 1", (content, now))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"status": "ok", "updated_at": now}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/profile/default")
|
||||||
|
async def get_default_profile():
|
||||||
|
return {"content": DEFAULT_PROFILE}
|
||||||
108
routers/search_route.py
Normal file
108
routers/search_route.py
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
"""JarvisChat routers - /api/search explicit search endpoint."""
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from fastapi.responses import StreamingResponse
|
||||||
|
|
||||||
|
from config import DEFAULT_MODEL, LLAMA_SERVER_BASE, MAX_SEARCH_QUERY_CHARS
|
||||||
|
from db import get_db
|
||||||
|
from search import query_searxng, format_search_results
|
||||||
|
from routers.chat import parse_llama_stream_chunk
|
||||||
|
from security import read_json_body, log_incident, BODY_LIMIT_CHAT_BYTES
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/api/search")
|
||||||
|
async def explicit_search(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_CHAT_BYTES)
|
||||||
|
query = body.get("query", "").strip()
|
||||||
|
if len(query) > MAX_SEARCH_QUERY_CHARS:
|
||||||
|
raise HTTPException(status_code=413, detail="Search query is too long")
|
||||||
|
conv_id = body.get("conversation_id")
|
||||||
|
model = body.get("model", DEFAULT_MODEL)
|
||||||
|
|
||||||
|
if not query:
|
||||||
|
raise HTTPException(status_code=400, detail="Empty query")
|
||||||
|
|
||||||
|
db = get_db()
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
|
||||||
|
if not conv_id:
|
||||||
|
conv_id = str(uuid.uuid4())
|
||||||
|
title = f"🔍 {query[:70]}..." if len(query) > 70 else f"🔍 {query}"
|
||||||
|
db.execute("INSERT INTO conversations (id, title, model, created_at, updated_at) VALUES (?, ?, ?, ?, ?)",
|
||||||
|
(conv_id, title, model, now, now))
|
||||||
|
else:
|
||||||
|
db.execute("UPDATE conversations SET updated_at = ? WHERE id = ?", (now, conv_id))
|
||||||
|
|
||||||
|
db.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "user", f"🔍 {query}", now))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
async def stream_search():
|
||||||
|
yield f"data: {json.dumps({'conversation_id': conv_id, 'searching': True})}\n\n"
|
||||||
|
|
||||||
|
results = await query_searxng(query, max_results=5)
|
||||||
|
|
||||||
|
if not results:
|
||||||
|
error_msg = "No search results found."
|
||||||
|
yield f"data: {json.dumps({'token': error_msg, 'conversation_id': conv_id})}\n\n"
|
||||||
|
db2 = get_db()
|
||||||
|
db2.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "assistant", error_msg, datetime.now(timezone.utc).isoformat()))
|
||||||
|
db2.commit()
|
||||||
|
db2.close()
|
||||||
|
yield f"data: {json.dumps({'done': True, 'conversation_id': conv_id})}\n\n"
|
||||||
|
return
|
||||||
|
|
||||||
|
yield f"data: {json.dumps({'search_results': len(results), 'conversation_id': conv_id})}\n\n"
|
||||||
|
|
||||||
|
search_context = format_search_results(results)
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": f"You have access to current web data. Answer directly using ONLY the data below. Be concise. No apologies. No disclaimers.\n\n{search_context}"},
|
||||||
|
{"role": "user", "content": query},
|
||||||
|
]
|
||||||
|
|
||||||
|
full_response = []
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
try:
|
||||||
|
async with client.stream(
|
||||||
|
"POST", f"{LLAMA_SERVER_BASE}/v1/chat/completions",
|
||||||
|
json={"model": model, "messages": messages, "stream": True},
|
||||||
|
timeout=httpx.Timeout(300.0, connect=10.0),
|
||||||
|
) as resp:
|
||||||
|
async for line in resp.aiter_lines():
|
||||||
|
if line.strip():
|
||||||
|
token, done, _ = parse_llama_stream_chunk(line)
|
||||||
|
if token:
|
||||||
|
full_response.append(token)
|
||||||
|
yield f"data: {json.dumps({'token': token, 'conversation_id': conv_id})}\n\n"
|
||||||
|
if done:
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
incident_key = log_incident("search_summarization_stream",
|
||||||
|
message="Stream failure during explicit search summarization",
|
||||||
|
request=request, exc=e)
|
||||||
|
yield f"data: {json.dumps({'error': 'Search summarization could not complete right now.', 'error_key': incident_key})}\n\n"
|
||||||
|
return
|
||||||
|
|
||||||
|
summary = "".join(full_response)
|
||||||
|
saved_msg = f"{summary}\n\n---\n*🔍 Web search results*"
|
||||||
|
|
||||||
|
db2 = get_db()
|
||||||
|
db2.execute("INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
|
||||||
|
(conv_id, "assistant", saved_msg, datetime.now(timezone.utc).isoformat()))
|
||||||
|
db2.commit()
|
||||||
|
db2.close()
|
||||||
|
|
||||||
|
yield f"data: {json.dumps({'raw_results': results, 'conversation_id': conv_id})}\n\n"
|
||||||
|
yield f"data: {json.dumps({'done': True, 'conversation_id': conv_id, 'searched': True})}\n\n"
|
||||||
|
|
||||||
|
return StreamingResponse(stream_search(), media_type="text/event-stream")
|
||||||
36
routers/settings.py
Normal file
36
routers/settings.py
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
"""JarvisChat routers - Settings."""
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from db import get_db
|
||||||
|
from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
from config import MAX_SETTINGS_KEYS, MAX_SETTINGS_VALUE_CHARS, ALLOWED_SETTINGS_KEYS
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/settings")
|
||||||
|
async def get_settings():
|
||||||
|
db = get_db()
|
||||||
|
rows = db.execute("SELECT key, value FROM settings").fetchall()
|
||||||
|
db.close()
|
||||||
|
return {row["key"]: row["value"] for row in rows}
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/api/settings")
|
||||||
|
async def update_settings(request: Request):
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
if not isinstance(body, dict):
|
||||||
|
raise HTTPException(status_code=400, detail="Settings payload must be an object")
|
||||||
|
if len(body) > MAX_SETTINGS_KEYS:
|
||||||
|
raise HTTPException(status_code=413, detail="Too many settings in one request")
|
||||||
|
unknown_keys = sorted(key for key in body.keys() if str(key) not in ALLOWED_SETTINGS_KEYS)
|
||||||
|
if unknown_keys:
|
||||||
|
raise HTTPException(status_code=400, detail=f"Unknown setting key(s): {', '.join(unknown_keys)}")
|
||||||
|
db = get_db()
|
||||||
|
for key, value in body.items():
|
||||||
|
if len(str(key)) > 80 or len(str(value)) > MAX_SETTINGS_VALUE_CHARS:
|
||||||
|
db.close()
|
||||||
|
raise HTTPException(status_code=413, detail="Setting key/value too long")
|
||||||
|
db.execute("INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)", (key, str(value)))
|
||||||
|
db.commit()
|
||||||
|
db.close()
|
||||||
|
return {"status": "ok"}
|
||||||
42
routers/skills.py
Normal file
42
routers/skills.py
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
"""JarvisChat routers - Skills."""
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from db import get_db, get_setting, list_skills_with_state, set_skill_enabled
|
||||||
|
from security import read_json_body, BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
from config import MAX_SKILL_KEY_CHARS, SKILLS_BY_KEY
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/skills")
|
||||||
|
async def list_skills():
|
||||||
|
db = get_db()
|
||||||
|
skills = list_skills_with_state(db)
|
||||||
|
db.close()
|
||||||
|
return {"skills": skills, "count": len(skills)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/api/skills/active")
|
||||||
|
async def list_active_skills():
|
||||||
|
db = get_db()
|
||||||
|
skills_enabled = get_setting(db, "skills_enabled", "true") == "true"
|
||||||
|
skills = list_skills_with_state(db)
|
||||||
|
db.close()
|
||||||
|
active = [s for s in skills if s["enabled"]] if skills_enabled else []
|
||||||
|
return {"skills": active, "count": len(active), "skills_enabled": skills_enabled}
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/api/skills/{skill_key}")
|
||||||
|
async def update_skill(skill_key: str, request: Request):
|
||||||
|
skill_key = skill_key.strip()
|
||||||
|
if len(skill_key) > MAX_SKILL_KEY_CHARS or skill_key not in SKILLS_BY_KEY:
|
||||||
|
raise HTTPException(status_code=404, detail="Unknown skill")
|
||||||
|
body = await read_json_body(request, BODY_LIMIT_DEFAULT_BYTES)
|
||||||
|
if "enabled" not in body or not isinstance(body.get("enabled"), bool):
|
||||||
|
raise HTTPException(status_code=400, detail="Field 'enabled' (boolean) is required")
|
||||||
|
db = get_db()
|
||||||
|
set_skill_enabled(db, skill_key, bool(body["enabled"]))
|
||||||
|
db.commit()
|
||||||
|
skills = list_skills_with_state(db)
|
||||||
|
db.close()
|
||||||
|
updated = next((s for s in skills if s["key"] == skill_key), None)
|
||||||
|
return {"status": "ok", "skill": updated}
|
||||||
144
search.py
Normal file
144
search.py
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - SearXNG integration, perplexity scoring, refusal/hedge detection.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
import math
|
||||||
|
import re
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from config import SEARXNG_BASE, PERPLEXITY_THRESHOLD, REFUSAL_PATTERNS, HEDGE_PATTERNS
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_outbound_url(url: str) -> str:
|
||||||
|
if not url:
|
||||||
|
return ""
|
||||||
|
candidate = url.strip()
|
||||||
|
parsed = urlparse(candidate)
|
||||||
|
if parsed.scheme.lower() in {"http", "https"} and parsed.netloc:
|
||||||
|
return candidate
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_perplexity(logprobs: list) -> float:
|
||||||
|
if not logprobs:
|
||||||
|
return 0.0
|
||||||
|
avg_logprob = sum(lp["logprob"] for lp in logprobs) / len(logprobs)
|
||||||
|
return math.exp(-avg_logprob)
|
||||||
|
|
||||||
|
|
||||||
|
def is_uncertain(logprobs: list, threshold: float = PERPLEXITY_THRESHOLD) -> bool:
|
||||||
|
if not logprobs:
|
||||||
|
return False
|
||||||
|
perplexity = calculate_perplexity(logprobs)
|
||||||
|
log.info(f"Perplexity: {perplexity:.2f} (threshold: {threshold})")
|
||||||
|
return perplexity > threshold
|
||||||
|
|
||||||
|
|
||||||
|
def is_refusal(text: str) -> bool:
|
||||||
|
match = REFUSAL_PATTERNS.search(text)
|
||||||
|
if match:
|
||||||
|
log.info(f"Refusal detected: '{match.group()}'")
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def clean_hedging(text: str) -> str:
|
||||||
|
cleaned = text
|
||||||
|
for pattern in HEDGE_PATTERNS:
|
||||||
|
cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE)
|
||||||
|
return cleaned.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def format_search_results(results: list) -> str:
|
||||||
|
if not results:
|
||||||
|
return ""
|
||||||
|
lines = ["[LIVE WEB DATA]\n"]
|
||||||
|
for i, r in enumerate(results, 1):
|
||||||
|
lines.append(f"{i}. {r['title']}")
|
||||||
|
if r["content"]:
|
||||||
|
lines.append(f" {r['content']}")
|
||||||
|
lines.append("")
|
||||||
|
lines.append("\nAnswer directly using the data above. No apologies. No disclaimers. Just answer.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def format_direct_answer(question: str, results: list) -> str:
|
||||||
|
if not results:
|
||||||
|
return "No search results found."
|
||||||
|
lines = ["Here's what I found:\n"]
|
||||||
|
for r in results[:3]:
|
||||||
|
lines.append(f"**{r['title']}**")
|
||||||
|
if r["content"]:
|
||||||
|
lines.append(f"{r['content']}")
|
||||||
|
lines.append("")
|
||||||
|
return "\n".join(lines).strip()
|
||||||
|
|
||||||
|
|
||||||
|
def extract_search_query(user_message: str) -> str:
|
||||||
|
query = user_message.strip()
|
||||||
|
if re.search(r"temperature|weather", query, re.IGNORECASE):
|
||||||
|
query = re.sub(r"^what('?s| is) the ", "", query, flags=re.IGNORECASE) + " right now degrees"
|
||||||
|
if re.search(r"price|spot price", query, re.IGNORECASE):
|
||||||
|
query = re.sub(r"^(what('?s| is)|can you tell me) the ", "", query, flags=re.IGNORECASE) + " today USD"
|
||||||
|
query = re.sub(
|
||||||
|
r"^(what|who|where|when|why|how|is|are|can|could|would|should|do|does|did)\s+",
|
||||||
|
"", query, flags=re.IGNORECASE,
|
||||||
|
)
|
||||||
|
query = re.sub(r"[?!.]+$", "", query)
|
||||||
|
return query[:100].strip() or user_message[:100]
|
||||||
|
|
||||||
|
|
||||||
|
async def query_searxng(query: str, max_results: int = 5) -> list:
|
||||||
|
log.info(f"Querying SearXNG: '{query}'")
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
weather_match = re.search(
|
||||||
|
r"(?:weather|temperature|forecast)\s+(?:in\s+)?(.+?)(?:\s+right now|\s+today|\s+degrees)?$",
|
||||||
|
query, re.IGNORECASE,
|
||||||
|
)
|
||||||
|
if weather_match or "weather" in query.lower() or "temperature" in query.lower():
|
||||||
|
location = (
|
||||||
|
weather_match.group(1) if weather_match
|
||||||
|
else re.sub(r"(weather|temperature|forecast|right now|today|degrees)", "", query, flags=re.IGNORECASE).strip()
|
||||||
|
)
|
||||||
|
if location:
|
||||||
|
try:
|
||||||
|
resp = await client.get(f"https://wttr.in/{location}?format=3", timeout=10.0,
|
||||||
|
headers={"User-Agent": "curl/7.68.0"})
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return [{"title": "Current Weather",
|
||||||
|
"url": sanitize_outbound_url(f"https://wttr.in/{location}"),
|
||||||
|
"content": resp.text.strip()}]
|
||||||
|
except Exception as e:
|
||||||
|
log.warning(f"wttr.in error: {e}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = await client.get(
|
||||||
|
f"{SEARXNG_BASE}/search",
|
||||||
|
params={"q": query, "format": "json", "categories": "general"},
|
||||||
|
timeout=10.0,
|
||||||
|
)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
data = resp.json()
|
||||||
|
results = []
|
||||||
|
for answer in data.get("answers", []):
|
||||||
|
results.append({"title": "Direct Answer", "url": "", "content": answer})
|
||||||
|
for box in data.get("infoboxes", []):
|
||||||
|
content = box.get("content", "")
|
||||||
|
if not content and box.get("attributes"):
|
||||||
|
content = " | ".join([f"{a.get('label','')}: {a.get('value','')}" for a in box["attributes"]])
|
||||||
|
results.append({
|
||||||
|
"title": box.get("infobox", "Info"),
|
||||||
|
"url": sanitize_outbound_url(box.get("urls", [{}])[0].get("url", "") if box.get("urls") else ""),
|
||||||
|
"content": content,
|
||||||
|
})
|
||||||
|
for r in data.get("results", [])[:max_results]:
|
||||||
|
results.append({"title": r.get("title", ""), "url": sanitize_outbound_url(r.get("url", "")), "content": r.get("content", "")})
|
||||||
|
log.info(f"SearXNG returned {len(results)} results")
|
||||||
|
return results
|
||||||
|
except Exception as e:
|
||||||
|
log.error(f"SearXNG error: {e}")
|
||||||
|
return []
|
||||||
175
security.py
Normal file
175
security.py
Normal file
@@ -0,0 +1,175 @@
|
|||||||
|
"""
|
||||||
|
JarvisChat - Security utilities.
|
||||||
|
PIN hashing, audit logging, incident tracking, CSRF/origin checks,
|
||||||
|
rate limiting, request helpers.
|
||||||
|
"""
|
||||||
|
import hashlib
|
||||||
|
import hmac
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import math
|
||||||
|
import os
|
||||||
|
import platform
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from collections import defaultdict, deque
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from threading import Lock
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
from fastapi import HTTPException, Request
|
||||||
|
|
||||||
|
from config import (
|
||||||
|
ALLOWED_NETWORKS, TRUST_X_FORWARDED_FOR, TRUSTED_ORIGINS,
|
||||||
|
BODY_LIMIT_DEFAULT_BYTES, BODY_LIMIT_CHAT_BYTES, BODY_LIMIT_PROFILE_BYTES,
|
||||||
|
RATE_WINDOW_SECONDS, RL_LOGIN_PER_WINDOW, RL_CHAT_PER_WINDOW,
|
||||||
|
RL_SEARCH_PER_WINDOW, RL_STATS_PER_WINDOW, RL_WRITE_PER_WINDOW,
|
||||||
|
RL_DEFAULT_PER_WINDOW, VERSION,
|
||||||
|
)
|
||||||
|
|
||||||
|
import ipaddress
|
||||||
|
|
||||||
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
|
SESSIONS: dict = {}
|
||||||
|
PIN_ATTEMPTS: dict = {}
|
||||||
|
RATE_EVENTS: dict = defaultdict(deque)
|
||||||
|
SESSION_LOCK = Lock()
|
||||||
|
RATE_LOCK = Lock()
|
||||||
|
|
||||||
|
|
||||||
|
def hash_pin(pin: str, salt_hex: Optional[str] = None) -> tuple:
|
||||||
|
salt = bytes.fromhex(salt_hex) if salt_hex else os.urandom(16)
|
||||||
|
digest = hashlib.pbkdf2_hmac("sha256", pin.encode("utf-8"), salt, 200_000)
|
||||||
|
return salt.hex(), digest.hex()
|
||||||
|
|
||||||
|
|
||||||
|
def audit_event(event: str, outcome: str, *, ip: str = "unknown", role: str = "none",
|
||||||
|
details: str = "", warning: bool = False) -> None:
|
||||||
|
payload = {"event": event, "outcome": outcome, "ip": ip, "role": role, "details": details[:300]}
|
||||||
|
msg = "AUDIT " + json.dumps(payload, separators=(",", ":"))
|
||||||
|
if warning:
|
||||||
|
log.warning(msg)
|
||||||
|
else:
|
||||||
|
log.info(msg)
|
||||||
|
|
||||||
|
|
||||||
|
def create_incident_key() -> str:
|
||||||
|
ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
||||||
|
return f"INC-{ts}-{uuid.uuid4().hex[:8].upper()}"
|
||||||
|
|
||||||
|
|
||||||
|
def customer_error_envelope(message: str, incident_key: str) -> dict:
|
||||||
|
return {
|
||||||
|
"detail": message, "error_key": incident_key,
|
||||||
|
"error": {"message": message, "incident_key": incident_key,
|
||||||
|
"support_hint": "Share this incident key for exact diagnostics."},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def log_incident(event: str, *, message: str, request: Optional[Request] = None,
|
||||||
|
exc: Optional[Exception] = None) -> str:
|
||||||
|
incident_key = create_incident_key()
|
||||||
|
payload = {
|
||||||
|
"event": event, "incident_key": incident_key, "message": message,
|
||||||
|
"app_version": VERSION, "pid": os.getpid(), "python": platform.python_version(),
|
||||||
|
"platform": platform.platform(),
|
||||||
|
"method": request.method if request else "",
|
||||||
|
"path": request.url.path if request else "",
|
||||||
|
"client_ip": get_client_ip(request) if request else "",
|
||||||
|
}
|
||||||
|
if exc:
|
||||||
|
log.exception("INCIDENT " + json.dumps(payload, separators=(",", ":")))
|
||||||
|
else:
|
||||||
|
log.error("INCIDENT " + json.dumps(payload, separators=(",", ":")))
|
||||||
|
return incident_key
|
||||||
|
|
||||||
|
|
||||||
|
def get_client_ip(request: Request) -> str:
|
||||||
|
forwarded = request.headers.get("x-forwarded-for", "").strip()
|
||||||
|
if TRUST_X_FORWARDED_FOR and forwarded:
|
||||||
|
return forwarded.split(",")[0].strip()
|
||||||
|
if request.client and request.client.host:
|
||||||
|
return request.client.host
|
||||||
|
return "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def is_ip_allowed(ip: str) -> bool:
|
||||||
|
normalized = ip.strip().lower()
|
||||||
|
if normalized in {"localhost", "testclient"}:
|
||||||
|
normalized = "127.0.0.1"
|
||||||
|
try:
|
||||||
|
ip_obj = ipaddress.ip_address(normalized)
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
for network in ALLOWED_NETWORKS:
|
||||||
|
if ip_obj in network:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def request_body_limit(path: str) -> int:
|
||||||
|
if path in {"/api/chat", "/api/search"}:
|
||||||
|
return BODY_LIMIT_CHAT_BYTES
|
||||||
|
if path == "/api/profile":
|
||||||
|
return BODY_LIMIT_PROFILE_BYTES
|
||||||
|
return BODY_LIMIT_DEFAULT_BYTES
|
||||||
|
|
||||||
|
|
||||||
|
def rate_policy(path: str, method: str, ip: str, sid: str) -> tuple:
|
||||||
|
identity = sid or ip
|
||||||
|
if path == "/api/auth/login":
|
||||||
|
return f"login:{ip}", RL_LOGIN_PER_WINDOW
|
||||||
|
if path == "/api/chat":
|
||||||
|
return f"chat:{identity}", RL_CHAT_PER_WINDOW
|
||||||
|
if path == "/api/search":
|
||||||
|
return f"search:{identity}", RL_SEARCH_PER_WINDOW
|
||||||
|
if path == "/api/stats":
|
||||||
|
return f"stats:{ip}", RL_STATS_PER_WINDOW
|
||||||
|
if method in {"POST", "PUT", "DELETE", "PATCH"}:
|
||||||
|
return f"write:{identity}", RL_WRITE_PER_WINDOW
|
||||||
|
return f"api:{identity}", RL_DEFAULT_PER_WINDOW
|
||||||
|
|
||||||
|
|
||||||
|
def check_rate_limit(key: str, limit: int, window_seconds: int) -> tuple:
|
||||||
|
now_ts = time.time()
|
||||||
|
with RATE_LOCK:
|
||||||
|
bucket = RATE_EVENTS[key]
|
||||||
|
while bucket and bucket[0] <= (now_ts - window_seconds):
|
||||||
|
bucket.popleft()
|
||||||
|
if len(bucket) >= limit:
|
||||||
|
retry_after = max(1, int(math.ceil(window_seconds - (now_ts - bucket[0]))))
|
||||||
|
return False, retry_after
|
||||||
|
bucket.append(now_ts)
|
||||||
|
return True, 0
|
||||||
|
|
||||||
|
|
||||||
|
def origin_allowed(request: Request) -> bool:
|
||||||
|
host = request.headers.get("host", "").strip()
|
||||||
|
expected_origin = f"{request.url.scheme}://{host}".rstrip("/") if host else ""
|
||||||
|
origin = request.headers.get("origin", "").strip().rstrip("/")
|
||||||
|
referer = request.headers.get("referer", "").strip()
|
||||||
|
if origin:
|
||||||
|
return origin == expected_origin or origin in TRUSTED_ORIGINS
|
||||||
|
if referer:
|
||||||
|
parsed = urlparse(referer)
|
||||||
|
ref_origin = f"{parsed.scheme}://{parsed.netloc}".rstrip("/")
|
||||||
|
return ref_origin == expected_origin or ref_origin in TRUSTED_ORIGINS
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def is_state_changing(method: str) -> bool:
|
||||||
|
return method in {"POST", "PUT", "DELETE", "PATCH"}
|
||||||
|
|
||||||
|
|
||||||
|
async def read_json_body(request: Request, max_bytes: int) -> dict:
|
||||||
|
raw = await request.body()
|
||||||
|
if len(raw) > max_bytes:
|
||||||
|
raise HTTPException(status_code=413, detail="Request payload too large")
|
||||||
|
if not raw:
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
return json.loads(raw.decode("utf-8"))
|
||||||
|
except Exception:
|
||||||
|
raise HTTPException(status_code=400, detail="Invalid JSON payload")
|
||||||
BIN
static/jcscreenie.png
Normal file
BIN
static/jcscreenie.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 270 KiB |
BIN
static/logo.jpg
Normal file
BIN
static/logo.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 206 KiB |
1251
templates/index.html
Normal file
1251
templates/index.html
Normal file
File diff suppressed because it is too large
Load Diff
78
tests/test_auth_capabilities.py
Normal file
78
tests/test_auth_capabilities.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_client(tmp_path: Path) -> TestClient:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-test.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
return TestClient(app_module.app)
|
||||||
|
|
||||||
|
|
||||||
|
def test_guest_read_only_admin_write_blocked(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
guest = client.post("/api/auth/guest", headers={"Origin": "http://testserver"})
|
||||||
|
assert guest.status_code == 200
|
||||||
|
sid = guest.json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid}
|
||||||
|
|
||||||
|
read_resp = client.get("/api/memories", headers=headers)
|
||||||
|
assert read_resp.status_code == 200
|
||||||
|
|
||||||
|
write_resp = client.post(
|
||||||
|
"/api/memories",
|
||||||
|
json={"fact": "guest write should fail", "topic": "general"},
|
||||||
|
headers={**headers, "Origin": "http://testserver"},
|
||||||
|
)
|
||||||
|
assert write_resp.status_code == 403
|
||||||
|
|
||||||
|
|
||||||
|
def test_admin_can_write_and_delete_memory(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
login = client.post(
|
||||||
|
"/api/auth/login",
|
||||||
|
json={"pin": "1234"},
|
||||||
|
headers={"Origin": "http://testserver"},
|
||||||
|
)
|
||||||
|
assert login.status_code == 200
|
||||||
|
sid = login.json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
create_resp = client.post(
|
||||||
|
"/api/memories",
|
||||||
|
json={"fact": "admin write ok", "topic": "general"},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert create_resp.status_code == 200
|
||||||
|
rowid = create_resp.json()["rowid"]
|
||||||
|
|
||||||
|
delete_resp = client.delete(f"/api/memories/{rowid}", headers=headers)
|
||||||
|
assert delete_resp.status_code == 200
|
||||||
|
|
||||||
|
|
||||||
|
def test_origin_check_blocks_cross_site_writes(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
denied = client.post("/api/auth/guest", headers={"Origin": "http://evil.example"})
|
||||||
|
assert denied.status_code == 403
|
||||||
|
|
||||||
|
allowed = client.post("/api/auth/guest", headers={"Origin": "http://testserver"})
|
||||||
|
assert allowed.status_code == 200
|
||||||
|
|
||||||
|
|
||||||
|
def test_logout_revokes_session(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
guest = client.post("/api/auth/guest", headers={"Origin": "http://testserver"})
|
||||||
|
sid = guest.json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
logout = client.post("/api/auth/logout", headers=headers)
|
||||||
|
assert logout.status_code == 200
|
||||||
|
|
||||||
|
after = client.get("/api/memories", headers={"X-Session-ID": sid})
|
||||||
|
assert after.status_code == 401
|
||||||
188
tests/test_chat_streaming_and_memory_paths.py
Normal file
188
tests/test_chat_streaming_and_memory_paths.py
Normal file
@@ -0,0 +1,188 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_client(tmp_path: Path) -> TestClient:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-streaming.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.RATE_EVENTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
return TestClient(app_module.app, raise_server_exceptions=False)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_sse_payloads(body: str) -> list[dict]:
|
||||||
|
payloads: list[dict] = []
|
||||||
|
for chunk in body.split("\n\n"):
|
||||||
|
chunk = chunk.strip()
|
||||||
|
if not chunk.startswith("data: "):
|
||||||
|
continue
|
||||||
|
raw = chunk[len("data: ") :]
|
||||||
|
payloads.append(json.loads(raw))
|
||||||
|
return payloads
|
||||||
|
|
||||||
|
|
||||||
|
class _MockStreamResponse:
|
||||||
|
def __init__(self, lines: list[str]):
|
||||||
|
self._lines = lines
|
||||||
|
|
||||||
|
async def __aenter__(self):
|
||||||
|
return self
|
||||||
|
|
||||||
|
async def __aexit__(self, exc_type, exc, tb):
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def aiter_lines(self):
|
||||||
|
for line in self._lines:
|
||||||
|
yield line
|
||||||
|
|
||||||
|
|
||||||
|
def _stream_json_lines(events: list[dict]) -> list[str]:
|
||||||
|
return [json.dumps(event) for event in events]
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_stream_emits_tokens_and_done(tmp_path: Path, monkeypatch):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
|
||||||
|
"session_id"
|
||||||
|
]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
events = _stream_json_lines(
|
||||||
|
[
|
||||||
|
{"message": {"content": "Hel"}, "logprobs": [{"logprob": -0.01}]},
|
||||||
|
{"message": {"content": "lo"}, "logprobs": [{"logprob": -0.01}]},
|
||||||
|
{"done": True, "eval_count": 2, "eval_duration": 1000000000},
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
def stream_stub(self, method, url, json=None, timeout=None):
|
||||||
|
return _MockStreamResponse(events)
|
||||||
|
|
||||||
|
monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", stream_stub)
|
||||||
|
|
||||||
|
resp = client.post(
|
||||||
|
"/api/chat",
|
||||||
|
json={"message": "hello", "model": app_module.DEFAULT_MODEL},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
payloads = parse_sse_payloads(resp.text)
|
||||||
|
|
||||||
|
token_text = "".join(p.get("token", "") for p in payloads if "token" in p)
|
||||||
|
assert token_text == "Hello"
|
||||||
|
done_events = [p for p in payloads if p.get("done")]
|
||||||
|
assert done_events
|
||||||
|
assert "searched" not in done_events[-1]
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_auto_search_trigger_emits_search_events(tmp_path: Path, monkeypatch):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
|
||||||
|
"session_id"
|
||||||
|
]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
first_stream = _stream_json_lines(
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"message": {"content": "I am uncertain."},
|
||||||
|
"logprobs": [{"logprob": -5.0}],
|
||||||
|
},
|
||||||
|
{"done": True, "eval_count": 2, "eval_duration": 1000000000},
|
||||||
|
]
|
||||||
|
)
|
||||||
|
second_stream = _stream_json_lines(
|
||||||
|
[
|
||||||
|
{"message": {"content": "Based on current data: 42."}},
|
||||||
|
{"done": True},
|
||||||
|
]
|
||||||
|
)
|
||||||
|
stream_batches = [first_stream, second_stream]
|
||||||
|
|
||||||
|
def stream_stub(self, method, url, json=None, timeout=None):
|
||||||
|
return _MockStreamResponse(stream_batches.pop(0))
|
||||||
|
|
||||||
|
async def search_stub(query: str, max_results: int = 5):
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"title": "Answer",
|
||||||
|
"url": "https://example.com",
|
||||||
|
"content": "The value is 42.",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", stream_stub)
|
||||||
|
monkeypatch.setattr(app_module, "query_searxng", search_stub)
|
||||||
|
|
||||||
|
resp = client.post(
|
||||||
|
"/api/chat",
|
||||||
|
json={"message": "what is the latest value", "model": app_module.DEFAULT_MODEL},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
payloads = parse_sse_payloads(resp.text)
|
||||||
|
|
||||||
|
assert any(p.get("searching") is True for p in payloads)
|
||||||
|
assert any("search_results" in p for p in payloads)
|
||||||
|
assert any(p.get("augmented") is True for p in payloads)
|
||||||
|
done_events = [p for p in payloads if p.get("done")]
|
||||||
|
assert done_events and done_events[-1].get("searched") is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_memory_command_paths_remember_and_forget(tmp_path: Path, monkeypatch):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
|
||||||
|
"session_id"
|
||||||
|
]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
base_stream = _stream_json_lines(
|
||||||
|
[
|
||||||
|
{"message": {"content": "ok"}, "logprobs": [{"logprob": -0.01}]},
|
||||||
|
{"done": True, "eval_count": 1, "eval_duration": 1000000000},
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
def stream_stub(self, method, url, json=None, timeout=None):
|
||||||
|
return _MockStreamResponse(base_stream)
|
||||||
|
|
||||||
|
monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", stream_stub)
|
||||||
|
|
||||||
|
remember_resp = client.post(
|
||||||
|
"/api/chat",
|
||||||
|
json={
|
||||||
|
"message": "remember that my favorite language is rust",
|
||||||
|
"model": app_module.DEFAULT_MODEL,
|
||||||
|
},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert remember_resp.status_code == 200
|
||||||
|
remember_events = parse_sse_payloads(remember_resp.text)
|
||||||
|
assert any("Remembered" in p.get("token", "") for p in remember_events)
|
||||||
|
|
||||||
|
memories_after_add = client.get("/api/memories", headers={"X-Session-ID": sid})
|
||||||
|
assert memories_after_add.status_code == 200
|
||||||
|
assert memories_after_add.json().get("count", 0) >= 1
|
||||||
|
|
||||||
|
forget_resp = client.post(
|
||||||
|
"/api/chat",
|
||||||
|
json={
|
||||||
|
"message": "forget about my favorite language",
|
||||||
|
"model": app_module.DEFAULT_MODEL,
|
||||||
|
},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert forget_resp.status_code == 200
|
||||||
|
forget_events = parse_sse_payloads(forget_resp.text)
|
||||||
|
assert any("Forgot" in p.get("token", "") for p in forget_events)
|
||||||
|
|
||||||
|
memories_after_forget = client.get("/api/memories", headers={"X-Session-ID": sid})
|
||||||
|
assert memories_after_forget.status_code == 200
|
||||||
|
assert memories_after_forget.json().get("count", 0) == 0
|
||||||
72
tests/test_error_envelopes.py
Normal file
72
tests/test_error_envelopes.py
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_client(tmp_path: Path) -> TestClient:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-errors.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.RATE_EVENTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
return TestClient(app_module.app, raise_server_exceptions=False)
|
||||||
|
|
||||||
|
|
||||||
|
def test_unhandled_api_exception_returns_friendly_error_with_incident_key(
|
||||||
|
tmp_path: Path, monkeypatch
|
||||||
|
):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
|
||||||
|
"session_id"
|
||||||
|
]
|
||||||
|
headers = {"X-Session-ID": sid}
|
||||||
|
|
||||||
|
def boom(_topic=None):
|
||||||
|
raise RuntimeError("super secret db internals")
|
||||||
|
|
||||||
|
monkeypatch.setattr(app_module, "get_all_memories", boom)
|
||||||
|
|
||||||
|
resp = client.get("/api/memories", headers=headers)
|
||||||
|
assert resp.status_code == 500
|
||||||
|
payload = resp.json()
|
||||||
|
assert payload.get("error_key", "").startswith("INC-")
|
||||||
|
assert "support lookup" in payload.get("detail", "").lower()
|
||||||
|
assert "super secret db internals" not in resp.text
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_stream_error_hides_internal_exception_and_emits_incident_key(
|
||||||
|
tmp_path: Path, monkeypatch
|
||||||
|
):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
|
||||||
|
"session_id"
|
||||||
|
]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
class BrokenStreamContext:
|
||||||
|
async def __aenter__(self):
|
||||||
|
raise RuntimeError("ultra secret model transport failure")
|
||||||
|
|
||||||
|
async def __aexit__(self, exc_type, exc, tb):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def broken_stream(*args, **kwargs):
|
||||||
|
return BrokenStreamContext()
|
||||||
|
|
||||||
|
monkeypatch.setattr(app_module.httpx.AsyncClient, "stream", broken_stream)
|
||||||
|
|
||||||
|
resp = client.post(
|
||||||
|
"/api/chat",
|
||||||
|
json={"message": "hello", "model": app_module.DEFAULT_MODEL},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert resp.status_code == 200
|
||||||
|
body = resp.text
|
||||||
|
assert "ultra secret model transport failure" not in body
|
||||||
|
assert "error_key" in body
|
||||||
|
assert "support lookup" in body.lower()
|
||||||
50
tests/test_ip_allowlist.py
Normal file
50
tests/test_ip_allowlist.py
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_client(tmp_path: Path) -> TestClient:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-ip.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.RATE_EVENTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
return TestClient(app_module.app)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ip_helper_allows_local_defaults():
|
||||||
|
assert app_module.is_ip_allowed("127.0.0.1")
|
||||||
|
assert app_module.is_ip_allowed("192.168.1.10")
|
||||||
|
assert app_module.is_ip_allowed("10.0.0.42")
|
||||||
|
assert app_module.is_ip_allowed("172.16.1.2")
|
||||||
|
assert app_module.is_ip_allowed("testclient")
|
||||||
|
|
||||||
|
|
||||||
|
def test_ip_helper_blocks_public_ip():
|
||||||
|
assert not app_module.is_ip_allowed("8.8.8.8")
|
||||||
|
|
||||||
|
|
||||||
|
def test_middleware_blocks_disallowed_ip(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
original_get_client_ip = app_module.get_client_ip
|
||||||
|
try:
|
||||||
|
app_module.get_client_ip = lambda _req: "8.8.8.8"
|
||||||
|
resp = client.post("/api/auth/guest")
|
||||||
|
assert resp.status_code == 403
|
||||||
|
finally:
|
||||||
|
app_module.get_client_ip = original_get_client_ip
|
||||||
|
|
||||||
|
|
||||||
|
def test_middleware_allows_local_ip(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
original_get_client_ip = app_module.get_client_ip
|
||||||
|
try:
|
||||||
|
app_module.get_client_ip = lambda _req: "192.168.50.109"
|
||||||
|
resp = client.post("/api/auth/guest")
|
||||||
|
assert resp.status_code == 200
|
||||||
|
finally:
|
||||||
|
app_module.get_client_ip = original_get_client_ip
|
||||||
76
tests/test_rate_and_payload_guardrails.py
Normal file
76
tests/test_rate_and_payload_guardrails.py
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_client(tmp_path: Path) -> TestClient:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-rate.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.RATE_EVENTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
return TestClient(app_module.app)
|
||||||
|
|
||||||
|
|
||||||
|
def test_stats_rate_limit_hits_429(tmp_path: Path):
|
||||||
|
old_limit = app_module.RL_STATS_PER_WINDOW
|
||||||
|
old_window = app_module.RATE_WINDOW_SECONDS
|
||||||
|
app_module.RL_STATS_PER_WINDOW = 2
|
||||||
|
app_module.RATE_WINDOW_SECONDS = 60
|
||||||
|
try:
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest").json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid}
|
||||||
|
|
||||||
|
r1 = client.get("/api/stats", headers=headers)
|
||||||
|
r2 = client.get("/api/stats", headers=headers)
|
||||||
|
r3 = client.get("/api/stats", headers=headers)
|
||||||
|
|
||||||
|
assert r1.status_code == 200
|
||||||
|
assert r2.status_code == 200
|
||||||
|
assert r3.status_code == 429
|
||||||
|
finally:
|
||||||
|
app_module.RL_STATS_PER_WINDOW = old_limit
|
||||||
|
app_module.RATE_WINDOW_SECONDS = old_window
|
||||||
|
|
||||||
|
|
||||||
|
def test_large_login_payload_rejected_413(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
huge_pin = "1" * (app_module.BODY_LIMIT_DEFAULT_BYTES + 100)
|
||||||
|
resp = client.post(
|
||||||
|
"/api/auth/login",
|
||||||
|
data=json.dumps({"pin": huge_pin}),
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
)
|
||||||
|
assert resp.status_code == 413
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_message_length_rejected_413(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest").json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
message = "x" * (app_module.MAX_CHAT_MESSAGE_CHARS + 1)
|
||||||
|
resp = client.post(
|
||||||
|
"/api/chat",
|
||||||
|
json={"message": message, "model": app_module.DEFAULT_MODEL},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 413
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_query_length_rejected_413(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest").json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
query = "q" * (app_module.MAX_SEARCH_QUERY_CHARS + 1)
|
||||||
|
resp = client.post(
|
||||||
|
"/api/search",
|
||||||
|
json={"query": query, "model": app_module.DEFAULT_MODEL},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 413
|
||||||
17
tests/test_search_url_sanitization.py
Normal file
17
tests/test_search_url_sanitization.py
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def test_sanitize_outbound_url_allows_http_https():
|
||||||
|
assert app_module.sanitize_outbound_url("https://example.com/path") == "https://example.com/path"
|
||||||
|
assert app_module.sanitize_outbound_url("http://example.com") == "http://example.com"
|
||||||
|
|
||||||
|
|
||||||
|
def test_sanitize_outbound_url_blocks_unsafe_schemes():
|
||||||
|
assert app_module.sanitize_outbound_url("javascript:alert(1)") == ""
|
||||||
|
assert app_module.sanitize_outbound_url("data:text/html,evil") == ""
|
||||||
|
assert app_module.sanitize_outbound_url("file:///etc/passwd") == ""
|
||||||
|
|
||||||
|
|
||||||
|
def test_sanitize_outbound_url_blocks_relative_and_empty():
|
||||||
|
assert app_module.sanitize_outbound_url("/relative/path") == ""
|
||||||
|
assert app_module.sanitize_outbound_url("") == ""
|
||||||
57
tests/test_settings_allowlist.py
Normal file
57
tests/test_settings_allowlist.py
Normal file
@@ -0,0 +1,57 @@
|
|||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_admin_client(tmp_path: Path) -> tuple[TestClient, dict[str, str]]:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-settings.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
login = client.post(
|
||||||
|
"/api/auth/login",
|
||||||
|
json={"pin": "1234"},
|
||||||
|
headers={"Origin": "http://testserver"},
|
||||||
|
)
|
||||||
|
assert login.status_code == 200
|
||||||
|
sid = login.json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
return client, headers
|
||||||
|
|
||||||
|
|
||||||
|
def test_settings_allow_known_keys(tmp_path: Path):
|
||||||
|
client, headers = make_admin_client(tmp_path)
|
||||||
|
try:
|
||||||
|
resp = client.put(
|
||||||
|
"/api/settings",
|
||||||
|
json={
|
||||||
|
"profile_enabled": "false",
|
||||||
|
"search_enabled": "true",
|
||||||
|
"memory_enabled": "false",
|
||||||
|
"default_model": "llama3.1:latest",
|
||||||
|
},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
finally:
|
||||||
|
client.close()
|
||||||
|
|
||||||
|
|
||||||
|
def test_settings_reject_unknown_keys(tmp_path: Path):
|
||||||
|
client, headers = make_admin_client(tmp_path)
|
||||||
|
try:
|
||||||
|
resp = client.put(
|
||||||
|
"/api/settings",
|
||||||
|
json={"admin_pin_hash": "oops"},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 400
|
||||||
|
assert "Unknown setting key" in resp.json().get("detail", "")
|
||||||
|
finally:
|
||||||
|
client.close()
|
||||||
93
tests/test_skills_framework.py
Normal file
93
tests/test_skills_framework.py
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import app as app_module
|
||||||
|
|
||||||
|
|
||||||
|
def make_client(tmp_path: Path) -> TestClient:
|
||||||
|
os.environ["JARVISCHAT_ADMIN_PIN"] = "1234"
|
||||||
|
app_module.DB_PATH = tmp_path / "jarvischat-skills.db"
|
||||||
|
app_module.SESSIONS.clear()
|
||||||
|
app_module.PIN_ATTEMPTS.clear()
|
||||||
|
app_module.RATE_EVENTS.clear()
|
||||||
|
app_module.init_db()
|
||||||
|
return TestClient(app_module.app, raise_server_exceptions=False)
|
||||||
|
|
||||||
|
|
||||||
|
def test_guest_can_list_skills(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
sid = client.post("/api/auth/guest", headers={"Origin": "http://testserver"}).json()[
|
||||||
|
"session_id"
|
||||||
|
]
|
||||||
|
resp = client.get("/api/skills", headers={"X-Session-ID": sid})
|
||||||
|
assert resp.status_code == 200
|
||||||
|
payload = resp.json()
|
||||||
|
assert payload["count"] >= 1
|
||||||
|
assert any(skill["key"] == "memory.search" for skill in payload["skills"])
|
||||||
|
|
||||||
|
|
||||||
|
def test_admin_can_toggle_skill_enabled_state(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
login = client.post(
|
||||||
|
"/api/auth/login",
|
||||||
|
json={"pin": "1234"},
|
||||||
|
headers={"Origin": "http://testserver"},
|
||||||
|
)
|
||||||
|
sid = login.json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
disable = client.put(
|
||||||
|
"/api/skills/search.web",
|
||||||
|
json={"enabled": False},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert disable.status_code == 200
|
||||||
|
assert disable.json()["skill"]["enabled"] is False
|
||||||
|
|
||||||
|
active = client.get("/api/skills/active", headers={"X-Session-ID": sid})
|
||||||
|
assert active.status_code == 200
|
||||||
|
assert all(skill["key"] != "search.web" for skill in active.json()["skills"])
|
||||||
|
|
||||||
|
|
||||||
|
def test_unknown_skill_update_is_rejected(tmp_path: Path):
|
||||||
|
with make_client(tmp_path) as client:
|
||||||
|
login = client.post(
|
||||||
|
"/api/auth/login",
|
||||||
|
json={"pin": "1234"},
|
||||||
|
headers={"Origin": "http://testserver"},
|
||||||
|
)
|
||||||
|
sid = login.json()["session_id"]
|
||||||
|
headers = {"X-Session-ID": sid, "Origin": "http://testserver"}
|
||||||
|
|
||||||
|
resp = client.put(
|
||||||
|
"/api/skills/nope.unknown",
|
||||||
|
json={"enabled": True},
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
assert resp.status_code == 404
|
||||||
|
|
||||||
|
|
||||||
|
def test_prompt_injection_respects_skills_enabled_setting(tmp_path: Path):
|
||||||
|
with make_client(tmp_path):
|
||||||
|
db = app_module.get_db()
|
||||||
|
try:
|
||||||
|
db.execute(
|
||||||
|
"INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)",
|
||||||
|
("skills_enabled", "false"),
|
||||||
|
)
|
||||||
|
db.commit()
|
||||||
|
without_skills = app_module.build_system_prompt(db, "", "hello")
|
||||||
|
assert "## Active Skills" not in without_skills
|
||||||
|
|
||||||
|
db.execute(
|
||||||
|
"INSERT OR REPLACE INTO settings (key, value) VALUES (?, ?)",
|
||||||
|
("skills_enabled", "true"),
|
||||||
|
)
|
||||||
|
db.commit()
|
||||||
|
with_skills = app_module.build_system_prompt(db, "", "hello")
|
||||||
|
assert "## Active Skills" in with_skills
|
||||||
|
assert "memory.search" in with_skills
|
||||||
|
finally:
|
||||||
|
db.close()
|
||||||
Reference in New Issue
Block a user