- Bump version to 1.8.0
- Add LLAMA_SERVER_BASE constant, point all inference calls to ultron:8081
- Update startup log to include llama-server endpoint
- Rewrite README: four pillars, cluster architecture diagram, AMD+NVIDIA RPC setup,
layer tuning progression (7→17→30-35 t/s), full API reference, complete roadmap A-L
- Reframe project identity: knowledge accumulation platform, not chat wrapper