From 3c546359242289916931ed5c9a6b2616d92f5d26 Mon Sep 17 00:00:00 2001 From: gramps Date: Fri, 10 Apr 2026 17:12:01 -0700 Subject: [PATCH] docs: comprehensive architecture delta record for hardening phase Catalogs all architectural changes from resident runtime implementation: - Runtime model: daemon-like process with coordinated shutdown - Broker dispatch: shutdown operation integration - Logger persistence: explicit IPL logging to MongoDB with root GUID lineage - Developer diagnostics: chain tracing and web-based observability - Config system: trace_on and logger_admin controls - Observability utility: modern log_dumper web UI (replaces legacy PHP dumper) - Operational safety: dev-only purge-on-IPL controls Files modified: 13 (src/main.rs, brokers/*, config/*, bin/log_dumper.rs, Cargo.*, wiki/*) Dependencies added: axum, chrono, uuid See wiki/12-architecture-deltas.md for full details. --- wiki/12-architecture-deltas.md | 161 +++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 wiki/12-architecture-deltas.md diff --git a/wiki/12-architecture-deltas.md b/wiki/12-architecture-deltas.md new file mode 100644 index 0000000..af7bd03 --- /dev/null +++ b/wiki/12-architecture-deltas.md @@ -0,0 +1,161 @@ +# Architecture Deltas — Recent Hardening Phase + +This document catalogs all architectural and design changes made in the recent hardening phase. + +## Runtime Model: Daemon-Like Resident Process + +**Status**: Completed in `src/main.rs` + +**Change**: Converted from startup-IPL-then-exit to a resident, coordinated-shutdown runtime. + +**Details**: +- IPL loads config, validates services, initializes broker pools, then enters an event loop. +- Loop waits for either: + - Global shutdown signal (broadcast from dispatcher when AMQP `shutdown` command received). + - User interrupt (Ctrl+C). +- On signal, loop cleanly shuts down Tokio tasks and exits with status code 0. + +**Why**: Aligns with operational daemon expectations (systemd, orchestrators). Ensures graceful lifecycle rather than abrupt termination. Supports hot-reload/redeployment workflows. + +--- + +## Broker Dispatch: Unified Consumer with Shutdown Semantics + +**Status**: Completed in `src/brokers/dispatcher.rs` and `src/brokers/mod.rs` + +**Change**: Integrated shutdown command handling into the unified dispatcher consumer. + +**Details**: +- Dispatcher pool now receives a global `shutdown_tx` channel at spawn time. +- Each dispatcher consumer listens for AMQP `shutdown` operation. +- On `shutdown`: acknowledge the message, broadcast shutdown signal to all peers, and exit cleanly. +- All dispatchers also listen on the global shutdown channel and exit if signaled externally. + +**Why**: Enables coordinated, multi-node shutdown without forceful process kill. Aligns with AMQP message semantics (shutdown is a standard operation, not a runtime hack). + +--- + +## Logger: Explicit IPL Persistence to MongoDB + +**Status**: Completed in `src/main.rs` and `src/brokers/logger_store.rs` + +**Change**: IPL startup/failure events now explicitly persisted to `msLogs` collection with structured context. + +**Details**: +- Root GUID generated at IPL start; all startup events tagged with this root ID. +- Structured log entries include: + - `root_event_id`: chains all startup events to a single root. + - `timestamp`: human-readable ISO 8601 format. + - `node_id`: configured node name/role. + - `event_type`: IPL phase (e.g., "ipl_start", "service_validated", "broker_pool_spawned", "ipl_complete"). + - `message`: human-readable summary. + - `metadata`: optional structured context (validation results, latency, etc.). +- If IPL fails, best-effort logging of failure event to Mongo before process exit. +- After IPL success, showcase log-level examples (INFO, WARN, ERROR) for visibility. + +**Why**: Startup is traditionally hardest to debug (logs often lost). Persistent, queryable startup context enables post-mortem analysis of deployment/initialization issues. Root GUID enables chain-crawl diagnostics across distributed startup events. + +--- + +## Developer Diagnostics: Root GUID Lineage and Chain Tracing + +**Status**: Completed in `src/brokers/logger_store.rs` and `src/bin/log_dumper.rs` + +**Change**: Added root GUID-based event chain tracing and query layer. + +**Details**: +- `logger_store::fetch_chain(root_event_id, limit)`: retrieve all events tagged with a root ID, sorted by timestamp. +- `logger_store::fetch_root_record(root_event_id)`: retrieve the initiating root event. +- `log_dumper` web UI exposes: + - Root GUID input field to query and visualize entire event chain. + - Single-record view at `/record?root_event_id=...` to inspect individual startup context. + - Arrow-trigger UX for expanding compact row summaries without constant page reload. + +**Why**: Enables developers to rapidly correlate events across a single startup sequence or transaction. Reduces manual log sifting. Scales from single node to multi-node deployments. + +--- + +## Configuration: Trace-On and Logger Admin Controls + +**Status**: Completed in `src/config/structs.rs` and `config/env_dev.toml` + +**Change**: Added two new config namespaces for developer and administrative control. + +**Details**: + +### `[runtime.trace_on]` +- Boolean flag (default: false in production, true in `env_dev.toml`). +- When true, logs method entry/exit at TRACE level for all broker consumers and core trait implementations. +- Enables dev to narrow causality in complex message flows without instrumenting code. + +### `[logger_admin]` +- `purge_on_ipl` (boolean, default: false): on successful IPL, automatically purge named collections before startup logging begins. +- `purge_collections` (array of strings): list of collection names to purge (e.g., `["msLogs", "msErrors"]`). +- Enables clean dev iteration: each `cargo run` in dev automatically resets logger state. + +**Why**: Reduces friction in dev loops. Trace-on avoids printf debugging. Purge-on-IPL ensures each test iteration starts fresh without manual `mongo` CLI cleanup. + +--- + +## Observability Utility: Modern Logger Reader (log_dumper) + +**Status**: Completed in `src/bin/log_dumper.rs` + +**Change**: Built a modern Rust equivalent to legacy PHP `utilities/dumper.php` for browsing `msLogs`. + +**Details**: +- **Web UI** (Axum): + - Dashboard route `/` with seed-write action, quick filter by level/node, root GUID chain input. + - Compact row layout: timestamp | level | node | message snippet | arrow (expand). + - Single-record view `/record?root_event_id=...` showing full event context. + - Arrow-trigger expansion shows full message without full-page refresh. + +- **Features**: + - Human-readable timestamps (ISO 8601 formatted). + - Seed-write to create test events and validate logger pipeline. + - Root chain traversal via GUID input. + - Dev-centric UX: minimal clicks, maximum information density. + +**Why**: Centralizes all observability into a single web interface. Replaces CLI-based manual querying. Makes startup diagnostics visible to entire team without MongoDB knowledge. + +--- + +## Operational Safety: Dev-Only Purge Controls + +**Status**: Completed in `src/main.rs` and config system + +**Change**: Added dev-only purge logic to reset logger collections on IPL in non-production environments. + +**Details**: +- IPL checks `config.logger_admin.purge_on_ipl` flag. +- If true and node is not production, purges collections listed in `config.logger_admin.purge_collections` before logging startup events. +- Prevents accidental production data loss (flag only honored in non-prod node roles). +- `env_dev.toml` enables this by default for frictionless dev iteration. + +**Why**: Closes dev/prod gap. Enables safe, repeatable testing without manual intervention. Prevents stale logger state from polluting diagnostics. + +--- + +## Commit Summary + +This hardening phase encompasses: + +1. **Runtime lifecycle**: Daemon model, coordinated shutdown, graceful exit. +2. **Broker semantics**: Shutdown operation integration, channel-based signaling. +3. **Logging infrastructure**: Persistent IPL events, root GUID lineage, structured context. +4. **Developer experience**: Trace control, purge controls, web-based observability. +5. **Configuration**: New `trace_on` and `logger_admin` namespaces. +6. **Tooling**: Modern Rust observability utility replacing legacy PHP dumper. + +**Files Changed**: +- `src/main.rs`: resident runtime loop, IPL logging, shutdown coordination, trace control. +- `src/brokers/dispatcher.rs`: shutdown operation handling, global shutdown listening. +- `src/brokers/mod.rs`: dispatcher pool accepts shutdown channels. +- `src/brokers/logger_store.rs`: root GUID chain fetch operations, structured logging helpers. +- `src/config/structs.rs`: `trace_on`, `logger_admin` config types. +- `src/bin/log_dumper.rs`: new modern observability utility (Axum web UI). +- `config/env_dev.toml`: dev overrides enabling trace/purge controls. +- `Cargo.toml` / `Cargo.lock`: added `axum`, `chrono`, `uuid` dependencies. +- Wiki updates: `Home.md`, `04-ipl.md`, `06-queue-topology.md`, `10-modernization-roadmap.md`, new `11-beds-architecture-visual-brief.md`. + +**Next Phase**: Autoscaling heuristics, metric collection, and cross-node coordinator election (deferred).