- Bump version to 1.8.0 - Add LLAMA_SERVER_BASE constant, point all inference calls to ultron:8081 - Update startup log to include llama-server endpoint - Rewrite README: four pillars, cluster architecture diagram, AMD+NVIDIA RPC setup, layer tuning progression (7→17→30-35 t/s), full API reference, complete roadmap A-L - Reframe project identity: knowledge accumulation platform, not chat wrapper
79 KiB
79 KiB