Files
jarvisChat/README.md
gramps 193829b7ff fix: resolve all critical runtime errors and bugs from audit
- Add COMPLETIONS_API_KEY to config.py (env var + auto-generated fallback)
- Fix perplexity auto-search: upstream sends logprobs=true, parse_llama_stream_chunk
  extracts per-token logprobs, all_logprobs populated during streaming
- Fix all /api/models endpoints to target LLAMA_SERVER_BASE (port 8081) not OLLAMA_BASE
- Fix RAG embedding endpoint URL from port 11434 (Ollama) to 8081 (llama-server)
- Correct misleading error messages: 'inference server' not 'Ollama'
- Remove raw_results leak from SSE event stream in /api/search
- Fix weather query extractor: pattern-match instead of unconditional suffix append
- Escape FTS5 operator keywords (AND/OR/NOT/NEAR) in memory search
- Move auth.py BODY_LIMIT_DEFAULT_BYTES imports to module level
- Change RAG injection log level from warning to info
- Fix all 8 test files after modular refactor (rewire imports from correct modules)
- Update AGENTS.md and README.md to reflect v1.8.0 changes
2026-06-27 15:10:32 -07:00

9.4 KiB

JarvisChat v1.8.0

A lightweight local inference coding companion with persistent memory, web search, and real-time system monitoring.

Built with FastAPI + SQLite + Jinja2. Runs on Python 3.13. No Docker required.

Developer wiki: docs/wiki/Home.md

What's New in v1.8.0

  • Modular refactor completed — single-file app.py split into config.py, db.py, auth.py, security.py, memory.py, search.py, rag.py, gpu.py, and routers/ package
  • COMPLETIONS_API_KEY — auto-generated secret key for the OpenAI-compatible endpoint, overridable via JARVISCHAT_COMPLETIONS_API_KEY env var
  • Perplexity auto-search fixed — upstream request now sends "logprobs": true, parse_llama_stream_chunk() extracts per-token logprobs, so calculate_perplexity() and is_uncertain() work correctly (was dead code)
  • All /api/models endpoints — now correctly target LLAMA_SERVER_BASE (llama-server on port 8081) instead of the old Ollama port; /api/ps uses /v1/models endpoint
  • RAG embedding endpoint fixedEMBED_URL changed from port :11434 (Ollama) to :8081 (llama-server)
  • Error messages corrected — all user-facing errors say "inference server" instead of "Ollama" or "llama-server"
  • Secure SSE protocol — raw search results are no longer leaked in the SSE event stream
  • FTS5 query safety — operator keywords (AND, OR, NOT, NEAR) are double-quoted to prevent parse errors
  • All 8 test files fixed — rewired imports after the modular refactor; all 26 tests pass

Features

  • Persistent Memory — SQLite FTS5 full-text search for fast, relevant memory retrieval
  • Web Search — SearXNG integration for automatic web lookups when the model is uncertain
  • Explicit Search — Search button to force web search without waiting for model uncertainty
  • Profile Injection — Custom system prompt injected into every conversation
  • System Presets — Save and switch between different system prompts
  • Real-time Stats — CPU, RAM, GPU, VRAM monitoring in sidebar
  • Token Thermometer — Visual context window usage indicator
  • Streaming Responses — Server-sent events for real-time token display
  • Conversation History — SQLite-backed chat persistence with mass-delete option
  • Model Switching — Change inference models on the fly
  • Skills Framework — Built-in skill registry with per-skill enable/disable controls

File Structure

/opt/jarvischat/
├── app.py              # FastAPI app entry point
├── config.py           # Constants, env vars, limits, skill registry
├── db.py               # SQLite schema, connection factory
├── auth.py             # PIN-based guest/admin sessions, auth routes
├── security.py         # Rate limiting, origin checks, IP allowlist, audit
├── memory.py           # FTS5 memory CRUD, remember/forget commands
├── search.py           # SearXNG integration, perplexity, refusal detection
├── rag.py              # Qdrant vector search + system prompt assembly
├── gpu.py              # AMD GPU stats via rocm-smi
├── routers/
│   ├── chat.py         # /api/chat streaming endpoint
│   ├── search_route.py # /api/search explicit search endpoint
│   ├── completions.py  # /v1/chat/completions OpenAI-compat endpoint
│   ├── conversations.py# Conversation CRUD
│   ├── memories.py     # Memory CRUD API
│   ├── models.py       # Model listing, system stats
│   ├── presets.py      # System prompt presets
│   ├── profile.py      # User profile
│   ├── settings.py     # Runtime settings
│   └── skills.py       # Skills management
├── static/
│   └── logo.png        # Logo image (optional)
├── templates/
│   └── index.html      # Frontend
└── tests/              # 26 pytest tests

Requirements

  • Python 3.11+ (tested on 3.13)
  • llama-server running locally or on network (OpenAI-compatible API on port 8081)
  • SearXNG (optional, for web search)

Installation

Fresh Install

# Create directory and venv
sudo mkdir -p /opt/jarvischat
sudo chown $USER:$USER /opt/jarvischat
cd /opt/jarvischat
python3 -m venv venv

# Install dependencies
./venv/bin/pip install fastapi uvicorn httpx psutil jinja2 python-multipart

# Set admin PIN before first startup (4 digits)
export JARVISCHAT_ADMIN_PIN=4827

# Create subdirectories
mkdir -p templates static

# Copy files
# (copy all .py files to /opt/jarvischat/)
# (copy routers/ directory to /opt/jarvischat/)
# (copy templates/index.html to /opt/jarvischat/templates/)

WARNING: Do not use 1234 as your admin PIN unless you accept weak local security.

NOTE: First boot requires JARVISCHAT_ADMIN_PIN unless you explicitly opt into insecure fallback with JARVISCHAT_ALLOW_DEFAULT_PIN=true.

Systemd Service

Create /etc/systemd/system/jarvischat.service:

[Unit]
Description=JarvisChat - Local Inference Web Interface
After=network.target

[Service]
Type=simple
User=jarvischat
Group=jarvischat
WorkingDirectory=/opt/jarvischat
ExecStart=/opt/jarvischat/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable jarvischat
sudo systemctl start jarvischat

Memory Commands

In chat, natural language triggers memory operations:

You say What happens
"remember that I prefer Rust over Go" Stores as preference
"remember that JarvisChat runs on port 8080" Stores as infrastructure
"note that the deadline is Friday" Stores as general
"forget about the deadline" Removes matching memories

Memories are automatically searched based on your message content and injected into the system prompt when relevant.

Memory Topics

Memories are auto-categorized:

  • preference — likes, dislikes, choices
  • project — active work, repos, tasks
  • infrastructure — servers, services, configs
  • personal — name, location, background
  • general — everything else

API Endpoints

Completions (OpenAI-compatible)

Method Endpoint Description
POST /v1/chat/completions OpenAI-compatible chat (requires Bearer API key)
Method Endpoint Description
POST /api/chat Send message (streaming SSE)
POST /api/search Explicit web search (streaming SSE)

Memory

Method Endpoint Description
GET /api/memories List all memories
POST /api/memories Add memory
PUT /api/memories/{rowid} Update memory
DELETE /api/memories/{rowid} Delete memory
GET /api/memories/search?q=term Search memories
GET /api/memories/stats Get counts by topic

Models & System

Method Endpoint Description
GET /api/models List available models
GET /api/ps List loaded models
POST /api/show Get model info
GET /api/stats CPU, RAM, GPU, VRAM stats
GET /api/search/status SearXNG availability

Settings & Profile

Method Endpoint Description
GET /api/profile Get profile content
PUT /api/profile Update profile (admin)
GET /api/profile/default Get default profile
GET /api/settings Get settings
PUT /api/settings Update settings (admin)

Conversations

Method Endpoint Description
GET /api/conversations List conversations
POST /api/conversations Create conversation
GET /api/conversations/{id} Get conversation with messages
PUT /api/conversations/{id} Update conversation title/model
DELETE /api/conversations/{id} Delete conversation
DELETE /api/conversations Delete ALL conversations

Presets

Method Endpoint Description
GET /api/presets List presets
POST /api/presets Create preset
PUT /api/presets/{id} Update preset
DELETE /api/presets/{id} Delete preset

Skills

Method Endpoint Description
GET /api/skills List all skills with state
GET /api/skills/active List active skills
PUT /api/skills/{key} Toggle skill enabled (admin)

Auth

Method Endpoint Description
POST /api/auth/guest Create guest session
POST /api/auth/login Admin PIN login
POST /api/auth/logout Revoke session
GET /api/auth/session Check session validity
POST /api/auth/heartbeat Extend session TTL

Configuration

Settings are stored in the settings table and include:

  • profile_enabled — Inject profile into chats (true/false)
  • search_enabled — Auto web search (true/false)
  • memory_enabled — Memory injection (true/false)
  • skills_enabled — Skills framework (true/false)
  • default_model — Default inference model

Testing

./venv/bin/python -m pytest tests/ -v

All 26 tests use tmp_path fixtures + monkeypatched httpx.AsyncClient.stream. No external services needed.

License

MIT

Repository

Gitea: ssh://gitea@llgit.llamachile.tube:1319/gramps/jarvisChat.git