- Add COMPLETIONS_API_KEY to config.py (env var + auto-generated fallback) - Fix perplexity auto-search: upstream sends logprobs=true, parse_llama_stream_chunk extracts per-token logprobs, all_logprobs populated during streaming - Fix all /api/models endpoints to target LLAMA_SERVER_BASE (port 8081) not OLLAMA_BASE - Fix RAG embedding endpoint URL from port 11434 (Ollama) to 8081 (llama-server) - Correct misleading error messages: 'inference server' not 'Ollama' - Remove raw_results leak from SSE event stream in /api/search - Fix weather query extractor: pattern-match instead of unconditional suffix append - Escape FTS5 operator keywords (AND/OR/NOT/NEAR) in memory search - Move auth.py BODY_LIMIT_DEFAULT_BYTES imports to module level - Change RAG injection log level from warning to info - Fix all 8 test files after modular refactor (rewire imports from correct modules) - Update AGENTS.md and README.md to reflect v1.8.0 changes
261 lines
9.4 KiB
Markdown
261 lines
9.4 KiB
Markdown
# JarvisChat v1.8.0
|
|
|
|
**A lightweight local inference coding companion with persistent memory, web search, and real-time system monitoring.**
|
|
|
|
Built with FastAPI + SQLite + Jinja2. Runs on Python 3.13. No Docker required.
|
|
|
|
Developer wiki: [docs/wiki/Home.md](docs/wiki/Home.md)
|
|
|
|
## What's New in v1.8.0
|
|
|
|
- **Modular refactor completed** — single-file `app.py` split into `config.py`, `db.py`, `auth.py`, `security.py`, `memory.py`, `search.py`, `rag.py`, `gpu.py`, and `routers/` package
|
|
- **`COMPLETIONS_API_KEY`** — auto-generated secret key for the OpenAI-compatible endpoint, overridable via `JARVISCHAT_COMPLETIONS_API_KEY` env var
|
|
- **Perplexity auto-search fixed** — upstream request now sends `"logprobs": true`, `parse_llama_stream_chunk()` extracts per-token logprobs, so `calculate_perplexity()` and `is_uncertain()` work correctly (was dead code)
|
|
- **All `/api/models` endpoints** — now correctly target `LLAMA_SERVER_BASE` (llama-server on port 8081) instead of the old Ollama port; `/api/ps` uses `/v1/models` endpoint
|
|
- **RAG embedding endpoint fixed** — `EMBED_URL` changed from port `:11434` (Ollama) to `:8081` (llama-server)
|
|
- **Error messages corrected** — all user-facing errors say "inference server" instead of "Ollama" or "llama-server"
|
|
- **Secure SSE protocol** — raw search results are no longer leaked in the SSE event stream
|
|
- **FTS5 query safety** — operator keywords (`AND`, `OR`, `NOT`, `NEAR`) are double-quoted to prevent parse errors
|
|
- **All 8 test files fixed** — rewired imports after the modular refactor; all 26 tests pass
|
|
|
|
## Features
|
|
|
|
- **Persistent Memory** — SQLite FTS5 full-text search for fast, relevant memory retrieval
|
|
- **Web Search** — SearXNG integration for automatic web lookups when the model is uncertain
|
|
- **Explicit Search** — Search button to force web search without waiting for model uncertainty
|
|
- **Profile Injection** — Custom system prompt injected into every conversation
|
|
- **System Presets** — Save and switch between different system prompts
|
|
- **Real-time Stats** — CPU, RAM, GPU, VRAM monitoring in sidebar
|
|
- **Token Thermometer** — Visual context window usage indicator
|
|
- **Streaming Responses** — Server-sent events for real-time token display
|
|
- **Conversation History** — SQLite-backed chat persistence with mass-delete option
|
|
- **Model Switching** — Change inference models on the fly
|
|
- **Skills Framework** — Built-in skill registry with per-skill enable/disable controls
|
|
|
|
## File Structure
|
|
|
|
```
|
|
/opt/jarvischat/
|
|
├── app.py # FastAPI app entry point
|
|
├── config.py # Constants, env vars, limits, skill registry
|
|
├── db.py # SQLite schema, connection factory
|
|
├── auth.py # PIN-based guest/admin sessions, auth routes
|
|
├── security.py # Rate limiting, origin checks, IP allowlist, audit
|
|
├── memory.py # FTS5 memory CRUD, remember/forget commands
|
|
├── search.py # SearXNG integration, perplexity, refusal detection
|
|
├── rag.py # Qdrant vector search + system prompt assembly
|
|
├── gpu.py # AMD GPU stats via rocm-smi
|
|
├── routers/
|
|
│ ├── chat.py # /api/chat streaming endpoint
|
|
│ ├── search_route.py # /api/search explicit search endpoint
|
|
│ ├── completions.py # /v1/chat/completions OpenAI-compat endpoint
|
|
│ ├── conversations.py# Conversation CRUD
|
|
│ ├── memories.py # Memory CRUD API
|
|
│ ├── models.py # Model listing, system stats
|
|
│ ├── presets.py # System prompt presets
|
|
│ ├── profile.py # User profile
|
|
│ ├── settings.py # Runtime settings
|
|
│ └── skills.py # Skills management
|
|
├── static/
|
|
│ └── logo.png # Logo image (optional)
|
|
├── templates/
|
|
│ └── index.html # Frontend
|
|
└── tests/ # 26 pytest tests
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- Python 3.11+ (tested on 3.13)
|
|
- llama-server running locally or on network (OpenAI-compatible API on port 8081)
|
|
- SearXNG (optional, for web search)
|
|
|
|
## Installation
|
|
|
|
### Fresh Install
|
|
|
|
```bash
|
|
# Create directory and venv
|
|
sudo mkdir -p /opt/jarvischat
|
|
sudo chown $USER:$USER /opt/jarvischat
|
|
cd /opt/jarvischat
|
|
python3 -m venv venv
|
|
|
|
# Install dependencies
|
|
./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart
|
|
|
|
# Set admin PIN before first startup (4 digits)
|
|
export JARVISCHAT_ADMIN_PIN=4827
|
|
|
|
# Create subdirectories
|
|
mkdir -p templates static
|
|
|
|
# Copy files
|
|
# (copy all .py files to /opt/jarvischat/)
|
|
# (copy routers/ directory to /opt/jarvischat/)
|
|
# (copy templates/index.html to /opt/jarvischat/templates/)
|
|
```
|
|
|
|
WARNING: Do not use `1234` as your admin PIN unless you accept weak local security.
|
|
|
|
NOTE: First boot requires `JARVISCHAT_ADMIN_PIN` unless you explicitly opt into insecure fallback with `JARVISCHAT_ALLOW_DEFAULT_PIN=true`.
|
|
|
|
## Systemd Service
|
|
|
|
Create `/etc/systemd/system/jarvischat.service`:
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=JarvisChat - Local Inference Web Interface
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=jarvischat
|
|
Group=jarvischat
|
|
WorkingDirectory=/opt/jarvischat
|
|
ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
|
|
Restart=always
|
|
RestartSec=5
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable jarvischat
|
|
sudo systemctl start jarvischat
|
|
```
|
|
|
|
## Memory Commands
|
|
|
|
In chat, natural language triggers memory operations:
|
|
|
|
| You say | What happens |
|
|
|---------|--------------|
|
|
| "remember that I prefer Rust over Go" | Stores as `preference` |
|
|
| "remember that JarvisChat runs on port 8080" | Stores as `infrastructure` |
|
|
| "note that the deadline is Friday" | Stores as `general` |
|
|
| "forget about the deadline" | Removes matching memories |
|
|
|
|
Memories are automatically searched based on your message content and injected into the system prompt when relevant.
|
|
|
|
### Memory Topics
|
|
|
|
Memories are auto-categorized:
|
|
- `preference` — likes, dislikes, choices
|
|
- `project` — active work, repos, tasks
|
|
- `infrastructure` — servers, services, configs
|
|
- `personal` — name, location, background
|
|
- `general` — everything else
|
|
|
|
## API Endpoints
|
|
|
|
### Completions (OpenAI-compatible)
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/v1/chat/completions` | OpenAI-compatible chat (requires Bearer API key) |
|
|
|
|
### Chat & Search
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/chat` | Send message (streaming SSE) |
|
|
| POST | `/api/search` | Explicit web search (streaming SSE) |
|
|
|
|
### Memory
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/memories` | List all memories |
|
|
| POST | `/api/memories` | Add memory |
|
|
| PUT | `/api/memories/{rowid}` | Update memory |
|
|
| DELETE | `/api/memories/{rowid}` | Delete memory |
|
|
| GET | `/api/memories/search?q=term` | Search memories |
|
|
| GET | `/api/memories/stats` | Get counts by topic |
|
|
|
|
### Models & System
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/models` | List available models |
|
|
| GET | `/api/ps` | List loaded models |
|
|
| POST | `/api/show` | Get model info |
|
|
| GET | `/api/stats` | CPU, RAM, GPU, VRAM stats |
|
|
| GET | `/api/search/status` | SearXNG availability |
|
|
|
|
### Settings & Profile
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/profile` | Get profile content |
|
|
| PUT | `/api/profile` | Update profile (admin) |
|
|
| GET | `/api/profile/default` | Get default profile |
|
|
| GET | `/api/settings` | Get settings |
|
|
| PUT | `/api/settings` | Update settings (admin) |
|
|
|
|
### Conversations
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/conversations` | List conversations |
|
|
| POST | `/api/conversations` | Create conversation |
|
|
| GET | `/api/conversations/{id}` | Get conversation with messages |
|
|
| PUT | `/api/conversations/{id}` | Update conversation title/model |
|
|
| DELETE | `/api/conversations/{id}` | Delete conversation |
|
|
| DELETE | `/api/conversations` | Delete ALL conversations |
|
|
|
|
### Presets
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/presets` | List presets |
|
|
| POST | `/api/presets` | Create preset |
|
|
| PUT | `/api/presets/{id}` | Update preset |
|
|
| DELETE | `/api/presets/{id}` | Delete preset |
|
|
|
|
### Skills
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/skills` | List all skills with state |
|
|
| GET | `/api/skills/active` | List active skills |
|
|
| PUT | `/api/skills/{key}` | Toggle skill enabled (admin) |
|
|
|
|
### Auth
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/auth/guest` | Create guest session |
|
|
| POST | `/api/auth/login` | Admin PIN login |
|
|
| POST | `/api/auth/logout` | Revoke session |
|
|
| GET | `/api/auth/session` | Check session validity |
|
|
| POST | `/api/auth/heartbeat` | Extend session TTL |
|
|
|
|
## Configuration
|
|
|
|
Settings are stored in the `settings` table and include:
|
|
|
|
- `profile_enabled` — Inject profile into chats (true/false)
|
|
- `search_enabled` — Auto web search (true/false)
|
|
- `memory_enabled` — Memory injection (true/false)
|
|
- `skills_enabled` — Skills framework (true/false)
|
|
- `default_model` — Default inference model
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
./venv/bin/python -m pytest tests/ -v
|
|
```
|
|
|
|
All 26 tests use `tmp_path` fixtures + monkeypatched `httpx.AsyncClient.stream`. No external services needed.
|
|
|
|
## License
|
|
|
|
MIT
|
|
|
|
## Repository
|
|
|
|
Gitea: `ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git`
|