EMBED_URL in rag.py hardcoded the IP and port instead of using
LLAMA_SERVER_BASE, so the env var JARVISCHAT_LLAMA_SERVER_BASE
was ignored for embedding requests.
- Extend origin check to all /api/ requests (not just state-changing methods),
closing the GET/HEAD/OPTIONS bypass that allowed cross-origin reads
- origin_allowed() now returns False when both Origin and Referer headers
are absent, preventing script-initiated requests from bypassing the check
- Update AGENTS.md and README.md to document the changes
- Bump version to 1.8.0
- Add LLAMA_SERVER_BASE constant, point all inference calls to ultron:8081
- Update startup log to include llama-server endpoint
- Rewrite README: four pillars, cluster architecture diagram, AMD+NVIDIA RPC setup,
layer tuning progression (7→17→30-35 t/s), full API reference, complete roadmap A-L
- Reframe project identity: knowledge accumulation platform, not chat wrapper