JarvisChat — Agents Guide

Run

./venv/bin/uvicorn app:app --host 0.0.0.0 --port 8080 --reload

Tests

./venv/bin/python -m pytest tests/ -v

All tests use tmp_path fixtures + monkeypatched httpx.AsyncClient.stream. No external services needed. Test factories reset SESSIONS, PIN_ATTEMPTS, RATE_EVENTS globals — be careful not to let test state leak. After the modular refactor, tests import directly from the correct modules (db, security, config, search, rag, memory, routers.*) — not from the old monolithic app namespace.

Every router has a dedicated test file:

File	Covers
`test_auth_capabilities.py`	`auth.py` — guest/admin sessions, origin blocking, logout
`test_chat_streaming_and_memory_paths.py`	`routers/chat.py` — streaming, auto-search, remember/forget
`test_completions.py`	`routers/completions.py` — API key auth, FIM, streaming, blocking, errors
`test_conversations.py`	`routers/conversations.py` — full CRUD, guest admin enforcement
`test_memories.py`	`routers/memories.py` — edit, search, stats endpoints
`test_models_router.py`	`routers/models.py` — models list, ps, show, stats, search/status
`test_presets.py`	`routers/presets.py` — full CRUD, default preset protection
`test_profile.py`	`routers/profile.py` — get, update, default, length validation
`test_search_route.py`	`routers/search_route.py` — explicit search flow, no results, errors
`test_search_url_sanitization.py`	`search.py` URL sanitizer
`test_settings_allowlist.py`	`routers/settings.py` — allowlisted key enforcement
`test_skills_framework.py`	`routers/skills.py` — list, toggle, unknown skill, prompt injection
`test_ip_allowlist.py`	IP allowlist helper + middleware
`test_rate_and_payload_guardrails.py`	Rate limits + payload size enforcement
`test_error_envelopes.py`	Global exception handler + stream error incidents

Modules that call httpx.AsyncClient (chat, completions, models, search_route) are mocked via monkeypatch.setattr on AsyncClient.stream, .get, or .post. CPU stats in models.py (api/stats) use real psutil; GPU stats are monkeypatched via routers.models.get_gpu_stats.

Architecture

Refactored from single-file (app.py) into modules under project root:

File	Role
`app.py`	FastAPI app, middleware, router registration
`config.py`	Constants, env vars, rate/payload limits, built-in skills registry
`db.py`	SQLite schema, connection factory, settings helpers
`auth.py`	PIN-based guest/admin sessions, auth routes
`security.py`	Rate limiting, origin checks, IP allowlist, audit/incident logging
`memory.py`	FTS5 memory CRUD, remember/forget command parsing
`search.py`	SearXNG integration, perplexity scoring, refusal detection
`rag.py`	Qdrant vector search + system prompt assembly
`gpu.py`	AMD GPU stats via `rocm-smi`
`routers/`	One module per endpoint group (chat, search, skills, completions, etc.)

Entrypoint / API keys

app.py line 148: uvicorn.run(app, ...) when called directly
config.py line 14: LLAMA_SERVER_BASE defaults to http://192.168.50.108:8081 — llama-server, not standard Ollama port, used by all model endpoints
config.py line 17: COMPLETIONS_API_KEY read from JARVISCHAT_COMPLETIONS_API_KEY env var or auto-generates a random key — no longer a missing import
config.py line 13: OLLAMA_BASE is legacy/unused — all endpoints now use LLAMA_SERVER_BASE

Key flows

/api/chat → process_remember_command() intercepts "remember that..." / "forget about..." first → else build_system_prompt() (profile + FTS5 memory + Qdrant RAG + preset + skills) → stream from llama-server with logprobs: true → if perplexity > 15.0 OR REFUSAL_PATTERNS match, re-query with SearXNG results
/api/search → bypasses perplexity/refusal, queries SearXNG directly → summarizes via llama-server (no raw results leaked in SSE)
/v1/chat/completions → OpenAI-compatible for Continue.dev/IDE integration; FIM requests proxied without persistence

Perplexity / auto-search

The upstream request includes "logprobs": true. parse_llama_stream_chunk() extracts per-token logprobs from each chunk's choices[0].logprobs.content[].logprob. The all_logprobs list is populated during streaming, so calculate_perplexity() and is_uncertain() work correctly — auto-search on high perplexity is no longer dead code.

Auth / lockdown

Guest session by default (POST /api/auth/guest), admin unlock via 4-digit PIN (POST /api/auth/login)
Admin required for PUT/DELETE/PATCH + all POST except allowlist (/api/chat, /api/search, /api/auth/*)
IP allowlist, rate limiting, origin checking, payload size limits — all enforced in app.py middleware
Origin check applies to all /api/ requests (not just state-changing methods); origin_allowed() returns False when both Origin and Referer headers are absent, closing CSRF read gap
JARVISCHAT_ADMIN_PIN env var required on first boot (or JARVISCHAT_ALLOW_DEFAULT_PIN=true)

Database

SQLite at jarvischat.db, auto-created by init_db() on startup via FastAPI lifespan
get_db() opens new connection per request (no pool). Close after use.
FTS5 virtual table memories for full-text search with BM25 ranking. FTS5 operator keywords (AND, OR, NOT, NEAR) are double-quoted to prevent parse errors.

External services

Service	Required	Port
llama-server (OpenAI-compat API)	Yes	8081 (ultron) or env `LLAMA_SERVER_BASE`
SearXNG	No	8888
wttr.in	No	weather shortcut bypasses SearXNG; curl UA for plain-text output
rocm-smi	No	AMD GPU stats
Qdrant	No	6333 (ultron) — RAG vector search

Config quirks

Rate limits and payload caps in config.py — tweak for testing by monkeypatching module attributes (note: patch security.RL_* not config.RL_* since security imports bindings separately)
ALLOWED_SETTINGS_KEYS in config.py controls which keys the UI can write via /api/settings
Settings table seeded with defaults (profile_enabled, search_enabled, memory_enabled, skills_enabled, default_model) — never overwritten by init_db()
Profile table uses singleton row id=1
RAG embedding requests go to EMBED_URL at /api/embeddings (separate Ollama instance)

SSE Protocol

All streaming endpoints yield data: {json}\n\n. Key shapes:

{token, conversation_id} — streaming token
{searching: true} — web search triggered
{search_results: N} — N results (no raw_results payload)
{done: true, perplexity, tokens_per_sec, searched?} — terminal
{error: "...", error_key: "..."} — error with incident key

6.8 KiB Raw Blame History