fix: replace hardcoded EMBED_URL with LLAMA_SERVER_BASE from config
EMBED_URL in rag.py hardcoded the IP and port instead of using LLAMA_SERVER_BASE, so the env var JARVISCHAT_LLAMA_SERVER_BASE was ignored for embedding requests.
This commit is contained in:
@@ -102,7 +102,7 @@ The upstream request includes `"logprobs": true`. `parse_llama_stream_chunk()` e
|
|||||||
- `ALLOWED_SETTINGS_KEYS` in `config.py` controls which keys the UI can write via `/api/settings`
|
- `ALLOWED_SETTINGS_KEYS` in `config.py` controls which keys the UI can write via `/api/settings`
|
||||||
- Settings table seeded with defaults (`profile_enabled`, `search_enabled`, `memory_enabled`, `skills_enabled`, `default_model`) — never overwritten by `init_db()`
|
- Settings table seeded with defaults (`profile_enabled`, `search_enabled`, `memory_enabled`, `skills_enabled`, `default_model`) — never overwritten by `init_db()`
|
||||||
- Profile table uses singleton row `id=1`
|
- Profile table uses singleton row `id=1`
|
||||||
- RAG embedding requests go to `LLAMA_SERVER_BASE` at `/api/embeddings` (port 8081, not 11434)
|
- RAG embedding requests go to `LLAMA_SERVER_BASE` at `/api/embeddings`
|
||||||
|
|
||||||
### SSE Protocol
|
### SSE Protocol
|
||||||
|
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ Developer wiki: [docs/wiki/Home.md](docs/wiki/Home.md)
|
|||||||
- **`COMPLETIONS_API_KEY`** — auto-generated secret key for the OpenAI-compatible endpoint, overridable via `JARVISCHAT_COMPLETIONS_API_KEY` env var
|
- **`COMPLETIONS_API_KEY`** — auto-generated secret key for the OpenAI-compatible endpoint, overridable via `JARVISCHAT_COMPLETIONS_API_KEY` env var
|
||||||
- **Perplexity auto-search fixed** — upstream request now sends `"logprobs": true`, `parse_llama_stream_chunk()` extracts per-token logprobs, so `calculate_perplexity()` and `is_uncertain()` work correctly (was dead code)
|
- **Perplexity auto-search fixed** — upstream request now sends `"logprobs": true`, `parse_llama_stream_chunk()` extracts per-token logprobs, so `calculate_perplexity()` and `is_uncertain()` work correctly (was dead code)
|
||||||
- **All `/api/models` endpoints** — now correctly target `LLAMA_SERVER_BASE` (llama-server on port 8081) instead of the old Ollama port; `/api/ps` uses `/v1/models` endpoint
|
- **All `/api/models` endpoints** — now correctly target `LLAMA_SERVER_BASE` (llama-server on port 8081) instead of the old Ollama port; `/api/ps` uses `/v1/models` endpoint
|
||||||
- **RAG embedding endpoint fixed** — `EMBED_URL` changed from port `:11434` (Ollama) to `:8081` (llama-server)
|
- **RAG embedding endpoint fixed** — hardcoded `EMBED_URL` replaced with `LLAMA_SERVER_BASE` from config, respecting the `JARVISCHAT_LLAMA_SERVER_BASE` env var
|
||||||
- **Error messages corrected** — all user-facing errors say "inference server" instead of "Ollama" or "llama-server"
|
- **Error messages corrected** — all user-facing errors say "inference server" instead of "Ollama" or "llama-server"
|
||||||
- **Secure SSE protocol** — raw search results are no longer leaked in the SSE event stream
|
- **Secure SSE protocol** — raw search results are no longer leaked in the SSE event stream
|
||||||
- **FTS5 query safety** — operator keywords (`AND`, `OR`, `NOT`, `NEAR`) are double-quoted to prevent parse errors
|
- **FTS5 query safety** — operator keywords (`AND`, `OR`, `NOT`, `NEAR`) are double-quoted to prevent parse errors
|
||||||
|
|||||||
5
rag.py
5
rag.py
@@ -7,12 +7,11 @@ import httpx
|
|||||||
|
|
||||||
from db import get_db, get_setting, list_skills_with_state, format_active_skills_prompt
|
from db import get_db, get_setting, list_skills_with_state, format_active_skills_prompt
|
||||||
from memory import search_memories
|
from memory import search_memories
|
||||||
from config import MAX_SKILL_PROMPT_CHARS
|
from config import LLAMA_SERVER_BASE, MAX_SKILL_PROMPT_CHARS
|
||||||
|
|
||||||
log = logging.getLogger("jarvischat")
|
log = logging.getLogger("jarvischat")
|
||||||
|
|
||||||
QDRANT_URL = "http://192.168.50.108:6333"
|
QDRANT_URL = "http://192.168.50.108:6333"
|
||||||
EMBED_URL = "http://192.168.50.108:8081"
|
|
||||||
EMBED_MODEL = "mxbai-embed-large"
|
EMBED_MODEL = "mxbai-embed-large"
|
||||||
RAG_COLLECTION = "jarvis_rag"
|
RAG_COLLECTION = "jarvis_rag"
|
||||||
RAG_SCORE_THRESHOLD = 0.25
|
RAG_SCORE_THRESHOLD = 0.25
|
||||||
@@ -22,7 +21,7 @@ async def query_rag(query: str, limit: int = 3) -> list:
|
|||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient() as client:
|
async with httpx.AsyncClient() as client:
|
||||||
embed_resp = await client.post(
|
embed_resp = await client.post(
|
||||||
f"{EMBED_URL}/api/embeddings",
|
f"{LLAMA_SERVER_BASE}/api/embeddings",
|
||||||
json={"model": EMBED_MODEL, "prompt": query},
|
json={"model": EMBED_MODEL, "prompt": query},
|
||||||
timeout=10.0,
|
timeout=10.0,
|
||||||
)
|
)
|
||||||
|
|||||||
Reference in New Issue
Block a user