⚡ JarvisChat
A lightweight Ollama coding companion that runs on Python 3.13
JarvisChat is a single-file FastAPI application that provides a clean, responsive web interface for Ollama. It features persistent memory, automatic web search when the model is uncertain, and real-time token tracking.
Features
- Persistent Profile/Memory — Your context is injected into every conversation automatically
- System Prompt Presets — Switch between coding assistant, sysadmin, general, or custom modes
- Streaming Chat — Real-time token streaming with conversation history
- Model Switching — Hot-swap between all installed Ollama models
- Web Search Integration — SearXNG kicks in automatically when the model is uncertain (perplexity-based)
- Weather Queries — Direct wttr.in integration for weather questions
- Token Thermometer — Visual context usage bar with live updates as you type
- Perplexity & Speed Badges — See model confidence (PPL) and tokens/sec on each response
- Copy-to-Clipboard — One-click copy on all code blocks
- Dark Theme — Easy on the eyes for long coding sessions
Architecture
Browser ◄──► app.py (FastAPI) ◄──► Ollama (LLM)
│
▼ (when uncertain)
SearXNG (web search)
JarvisChat acts as middleware between your browser and Ollama. When the model's perplexity exceeds a threshold (default 15.0) or it refuses to answer, JarvisChat automatically queries SearXNG, injects the results, and re-prompts the model.
This is NOT training — SearXNG is only used at runtime as a fallback for uncertain responses.
Requirements
- Python 3.11+ (tested on 3.13)
- Ollama running locally (default:
localhost:11434) - SearXNG (optional, for web search — default:
localhost:8888)
Installation
# Clone or download app.py
git clone https://llgit.llamachile.shop/gramps/jarvischat.git
cd jarvischat
# Install dependencies
pip install fastapi httpx uvicorn
# Run
python app.py
# or
uvicorn app:app --host 0.0.0.0 --port 8080
Open http://localhost:8080 in your browser.
Running as a Service
Important: Although JarvisChat is a single-file Python application, it's designed to run as a persistent service alongside Ollama — not as a one-off script. Both services should start on boot.
systemd Service (recommended)
Create /etc/systemd/system/jarvischat.service:
[Unit]
Description=JarvisChat - Ollama Web UI
After=network.target ollama.service
Wants=ollama.service
[Service]
Type=simple
User=jarvischat
WorkingDirectory=/opt/jarvischat
ExecStart=/usr/bin/python3 app.py
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Then enable and start:
sudo systemctl daemon-reload
sudo systemctl enable jarvischat
sudo systemctl start jarvischat
Verify Both Services
# Check Ollama
systemctl status ollama
# Check JarvisChat
systemctl status jarvischat
# View JarvisChat logs
journalctl -t jarvischat -f
Configuration
Edit these constants at the top of app.py:
VERSION = "1.3.0"
OLLAMA_BASE = "http://localhost:11434"
SEARXNG_BASE = "http://localhost:8888"
DEFAULT_MODEL = "deepseek-coder:6.7b"
PERPLEXITY_THRESHOLD = 15.0 # Higher = less likely to trigger search
Database
JarvisChat uses SQLite (jarvischat.db in the same directory as app.py):
| Table | Purpose |
|---|---|
| conversations | Chat sessions with model and timestamps |
| messages | Individual messages with role and content |
| system_presets | Saved system prompt presets |
| profile | Your persistent memory/context |
| settings | App settings (search/profile toggles, default model) |
Logging
JarvisChat logs to syslog via journald:
# Follow live logs
journalctl -t jarvischat -f
# View last 100 entries
journalctl -t jarvischat -n 100
Token Thermometer
The vertical bar next to the input shows your context usage in real-time:
- Green — Plenty of room
- Yellow — 70%+ used
- Red — 90%+ used (approaching limit)
The count includes: profile + preset + conversation history + current input. Context size is fetched from Ollama when you switch models.
Search Flow
- User sends message → Ollama streams response with logprobs
- JarvisChat calculates perplexity from logprobs
- If perplexity > 15.0 OR refusal patterns detected:
- Yield
{searching: True}to show spinner - Query SearXNG (or wttr.in for weather)
- Inject results into context
- Re-prompt Ollama
- Yield
- If model still refuses, format raw search results directly
- Clean hedging phrases from response
- Yield final response with PPL and t/s badges
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Web UI |
/api/models |
GET | List Ollama models |
/api/ps |
GET | Running models |
/api/show |
POST | Model info (context size) |
/api/chat |
POST | Stream chat (SSE) |
/api/conversations |
GET | List conversations |
/api/conversations/{id} |
GET/DELETE | Get/delete conversation |
/api/profile |
GET/PUT | Get/update profile |
/api/presets |
GET/POST | List/create presets |
/api/presets/{id} |
PUT/DELETE | Update/delete preset |
/api/settings |
GET/PUT | App settings |
/api/search/status |
GET | SearXNG availability |
Screenshots
(Add your own screenshot here)
TODO
Active
-
Mass-delete conversation history✓ (v1.3.0) -
Verify SearXNG and Docker services persist across reboots
- Expand refusal patterns: "As an AI model", "based on my training data", "I don't have the capability"
-
Input trigger:
search+prefix- Strip prefix, query SearXNG directly, Ollama summarizes
- Raw results in expandable div (not tooltip)
-
Add
profile.example.md- Recommended default profile with anti-bullshit rules (no "As an AI", no OpenAI mentions)
Backlog
- Conversation search/filter by keyword
- Export conversation to markdown/text
- Keyboard shortcuts (Ctrl+N new chat, Ctrl+Enter send)
Token count estimate before sending✓ (v1.2.9)- Model info display — context length, VRAM usage from Ollama
/api/ps - Retry button on assistant messages
- Source links — clickable links when search used
- Allow conversation renaming
- Multiple profiles — coding/sysadmin/general
- Auto-generate conversation tags (client-side KWIC, top 5, filterable badges)
- Image input support
- Pull vision model (llava, llama3.2-vision, etc.)
- Frontend: file input / drag-drop, base64 encode
- Backend: pass
imagesarray to Ollama/api/chat
Version History
| Version | Changes |
|---|---|
| 1.3.0 | Delete all conversations button |
| 1.2.9 | Token thermometer with live context tracking |
| 1.2.8 | Logo in sidebar, llama emoji tagline |
| 1.2.7 | Tokens per second (t/s) badge on responses |
| 1.2.6 | wttr.in weather integration, improved search extraction |
| 1.2.5 | SearXNG infoboxes/answers, smarter query building |
| 1.2.4 | Perplexity badges, hedging cleanup |
| 1.2.3 | SearXNG integration with perplexity-based triggering |
| 1.2.0 | System prompt presets, settings persistence |
| 1.1.0 | Profile memory, model switching |
| 1.0.0 | Initial release |
License
MIT
A Note from Gramps
I named my AI machine "jarvis" after the AI assistant in Iron Man (2008) — because it's an awesome name. When I started building a local coding companion to talk to it, "JarvisChat" just made sense.
This project is in active development. Eventually it'll get packaged up as a Docker thing, but for now while I'm iterating fast, a single-file Python service does the job.
Built with 🦙 by Gramps at the Llama Chile Shop