docs: comprehensive architecture delta record for hardening phase
Catalogs all architectural changes from resident runtime implementation: - Runtime model: daemon-like process with coordinated shutdown - Broker dispatch: shutdown operation integration - Logger persistence: explicit IPL logging to MongoDB with root GUID lineage - Developer diagnostics: chain tracing and web-based observability - Config system: trace_on and logger_admin controls - Observability utility: modern log_dumper web UI (replaces legacy PHP dumper) - Operational safety: dev-only purge-on-IPL controls Files modified: 13 (src/main.rs, brokers/*, config/*, bin/log_dumper.rs, Cargo.*, wiki/*) Dependencies added: axum, chrono, uuid See wiki/12-architecture-deltas.md for full details.
This commit is contained in:
161
wiki/12-architecture-deltas.md
Normal file
161
wiki/12-architecture-deltas.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# Architecture Deltas — Recent Hardening Phase
|
||||
|
||||
This document catalogs all architectural and design changes made in the recent hardening phase.
|
||||
|
||||
## Runtime Model: Daemon-Like Resident Process
|
||||
|
||||
**Status**: Completed in `src/main.rs`
|
||||
|
||||
**Change**: Converted from startup-IPL-then-exit to a resident, coordinated-shutdown runtime.
|
||||
|
||||
**Details**:
|
||||
- IPL loads config, validates services, initializes broker pools, then enters an event loop.
|
||||
- Loop waits for either:
|
||||
- Global shutdown signal (broadcast from dispatcher when AMQP `shutdown` command received).
|
||||
- User interrupt (Ctrl+C).
|
||||
- On signal, loop cleanly shuts down Tokio tasks and exits with status code 0.
|
||||
|
||||
**Why**: Aligns with operational daemon expectations (systemd, orchestrators). Ensures graceful lifecycle rather than abrupt termination. Supports hot-reload/redeployment workflows.
|
||||
|
||||
---
|
||||
|
||||
## Broker Dispatch: Unified Consumer with Shutdown Semantics
|
||||
|
||||
**Status**: Completed in `src/brokers/dispatcher.rs` and `src/brokers/mod.rs`
|
||||
|
||||
**Change**: Integrated shutdown command handling into the unified dispatcher consumer.
|
||||
|
||||
**Details**:
|
||||
- Dispatcher pool now receives a global `shutdown_tx` channel at spawn time.
|
||||
- Each dispatcher consumer listens for AMQP `shutdown` operation.
|
||||
- On `shutdown`: acknowledge the message, broadcast shutdown signal to all peers, and exit cleanly.
|
||||
- All dispatchers also listen on the global shutdown channel and exit if signaled externally.
|
||||
|
||||
**Why**: Enables coordinated, multi-node shutdown without forceful process kill. Aligns with AMQP message semantics (shutdown is a standard operation, not a runtime hack).
|
||||
|
||||
---
|
||||
|
||||
## Logger: Explicit IPL Persistence to MongoDB
|
||||
|
||||
**Status**: Completed in `src/main.rs` and `src/brokers/logger_store.rs`
|
||||
|
||||
**Change**: IPL startup/failure events now explicitly persisted to `msLogs` collection with structured context.
|
||||
|
||||
**Details**:
|
||||
- Root GUID generated at IPL start; all startup events tagged with this root ID.
|
||||
- Structured log entries include:
|
||||
- `root_event_id`: chains all startup events to a single root.
|
||||
- `timestamp`: human-readable ISO 8601 format.
|
||||
- `node_id`: configured node name/role.
|
||||
- `event_type`: IPL phase (e.g., "ipl_start", "service_validated", "broker_pool_spawned", "ipl_complete").
|
||||
- `message`: human-readable summary.
|
||||
- `metadata`: optional structured context (validation results, latency, etc.).
|
||||
- If IPL fails, best-effort logging of failure event to Mongo before process exit.
|
||||
- After IPL success, showcase log-level examples (INFO, WARN, ERROR) for visibility.
|
||||
|
||||
**Why**: Startup is traditionally hardest to debug (logs often lost). Persistent, queryable startup context enables post-mortem analysis of deployment/initialization issues. Root GUID enables chain-crawl diagnostics across distributed startup events.
|
||||
|
||||
---
|
||||
|
||||
## Developer Diagnostics: Root GUID Lineage and Chain Tracing
|
||||
|
||||
**Status**: Completed in `src/brokers/logger_store.rs` and `src/bin/log_dumper.rs`
|
||||
|
||||
**Change**: Added root GUID-based event chain tracing and query layer.
|
||||
|
||||
**Details**:
|
||||
- `logger_store::fetch_chain(root_event_id, limit)`: retrieve all events tagged with a root ID, sorted by timestamp.
|
||||
- `logger_store::fetch_root_record(root_event_id)`: retrieve the initiating root event.
|
||||
- `log_dumper` web UI exposes:
|
||||
- Root GUID input field to query and visualize entire event chain.
|
||||
- Single-record view at `/record?root_event_id=...` to inspect individual startup context.
|
||||
- Arrow-trigger UX for expanding compact row summaries without constant page reload.
|
||||
|
||||
**Why**: Enables developers to rapidly correlate events across a single startup sequence or transaction. Reduces manual log sifting. Scales from single node to multi-node deployments.
|
||||
|
||||
---
|
||||
|
||||
## Configuration: Trace-On and Logger Admin Controls
|
||||
|
||||
**Status**: Completed in `src/config/structs.rs` and `config/env_dev.toml`
|
||||
|
||||
**Change**: Added two new config namespaces for developer and administrative control.
|
||||
|
||||
**Details**:
|
||||
|
||||
### `[runtime.trace_on]`
|
||||
- Boolean flag (default: false in production, true in `env_dev.toml`).
|
||||
- When true, logs method entry/exit at TRACE level for all broker consumers and core trait implementations.
|
||||
- Enables dev to narrow causality in complex message flows without instrumenting code.
|
||||
|
||||
### `[logger_admin]`
|
||||
- `purge_on_ipl` (boolean, default: false): on successful IPL, automatically purge named collections before startup logging begins.
|
||||
- `purge_collections` (array of strings): list of collection names to purge (e.g., `["msLogs", "msErrors"]`).
|
||||
- Enables clean dev iteration: each `cargo run` in dev automatically resets logger state.
|
||||
|
||||
**Why**: Reduces friction in dev loops. Trace-on avoids printf debugging. Purge-on-IPL ensures each test iteration starts fresh without manual `mongo` CLI cleanup.
|
||||
|
||||
---
|
||||
|
||||
## Observability Utility: Modern Logger Reader (log_dumper)
|
||||
|
||||
**Status**: Completed in `src/bin/log_dumper.rs`
|
||||
|
||||
**Change**: Built a modern Rust equivalent to legacy PHP `utilities/dumper.php` for browsing `msLogs`.
|
||||
|
||||
**Details**:
|
||||
- **Web UI** (Axum):
|
||||
- Dashboard route `/` with seed-write action, quick filter by level/node, root GUID chain input.
|
||||
- Compact row layout: timestamp | level | node | message snippet | arrow (expand).
|
||||
- Single-record view `/record?root_event_id=...` showing full event context.
|
||||
- Arrow-trigger expansion shows full message without full-page refresh.
|
||||
|
||||
- **Features**:
|
||||
- Human-readable timestamps (ISO 8601 formatted).
|
||||
- Seed-write to create test events and validate logger pipeline.
|
||||
- Root chain traversal via GUID input.
|
||||
- Dev-centric UX: minimal clicks, maximum information density.
|
||||
|
||||
**Why**: Centralizes all observability into a single web interface. Replaces CLI-based manual querying. Makes startup diagnostics visible to entire team without MongoDB knowledge.
|
||||
|
||||
---
|
||||
|
||||
## Operational Safety: Dev-Only Purge Controls
|
||||
|
||||
**Status**: Completed in `src/main.rs` and config system
|
||||
|
||||
**Change**: Added dev-only purge logic to reset logger collections on IPL in non-production environments.
|
||||
|
||||
**Details**:
|
||||
- IPL checks `config.logger_admin.purge_on_ipl` flag.
|
||||
- If true and node is not production, purges collections listed in `config.logger_admin.purge_collections` before logging startup events.
|
||||
- Prevents accidental production data loss (flag only honored in non-prod node roles).
|
||||
- `env_dev.toml` enables this by default for frictionless dev iteration.
|
||||
|
||||
**Why**: Closes dev/prod gap. Enables safe, repeatable testing without manual intervention. Prevents stale logger state from polluting diagnostics.
|
||||
|
||||
---
|
||||
|
||||
## Commit Summary
|
||||
|
||||
This hardening phase encompasses:
|
||||
|
||||
1. **Runtime lifecycle**: Daemon model, coordinated shutdown, graceful exit.
|
||||
2. **Broker semantics**: Shutdown operation integration, channel-based signaling.
|
||||
3. **Logging infrastructure**: Persistent IPL events, root GUID lineage, structured context.
|
||||
4. **Developer experience**: Trace control, purge controls, web-based observability.
|
||||
5. **Configuration**: New `trace_on` and `logger_admin` namespaces.
|
||||
6. **Tooling**: Modern Rust observability utility replacing legacy PHP dumper.
|
||||
|
||||
**Files Changed**:
|
||||
- `src/main.rs`: resident runtime loop, IPL logging, shutdown coordination, trace control.
|
||||
- `src/brokers/dispatcher.rs`: shutdown operation handling, global shutdown listening.
|
||||
- `src/brokers/mod.rs`: dispatcher pool accepts shutdown channels.
|
||||
- `src/brokers/logger_store.rs`: root GUID chain fetch operations, structured logging helpers.
|
||||
- `src/config/structs.rs`: `trace_on`, `logger_admin` config types.
|
||||
- `src/bin/log_dumper.rs`: new modern observability utility (Axum web UI).
|
||||
- `config/env_dev.toml`: dev overrides enabling trace/purge controls.
|
||||
- `Cargo.toml` / `Cargo.lock`: added `axum`, `chrono`, `uuid` dependencies.
|
||||
- Wiki updates: `Home.md`, `04-ipl.md`, `06-queue-topology.md`, `10-modernization-roadmap.md`, new `11-beds-architecture-visual-brief.md`.
|
||||
|
||||
**Next Phase**: Autoscaling heuristics, metric collection, and cross-node coordinator election (deferred).
|
||||
Reference in New Issue
Block a user