- Add MariaDB (REL) IPL validation — master required, secondary non-fatal
- Add RelNodeConfig / RelInstanceConfig structs with master/secondary pattern
- Add rel_services section to beds.toml and test fixture
- Add detailed topology commentary to beds.toml covering standalone,
master/replica, Galera cluster, and multi-DB-per-node configurations
- Add developer wiki (wiki/) covering:
- Origin story — PHP Namaste history, production record, why Rust
- Architecture overview — full system diagram, all layers explained
- The four nodes — appServer, admin, segundo, tercero with real-world context
- IPL sequence — every step documented with rationale for ordering
- Configuration system — layering, env selection, adding new sections
- Queue topology — exchanges, routing keys, broker bindings, vhost isolation
- Template system — REC/REL, TLA convention, cache map, warehousing
- Event lineage — compound event IDs, parent/child tracking, msLogs schema
- Glossary
- Update README with wiki index and MariaDB status
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
145 lines
5.8 KiB
Markdown
145 lines
5.8 KiB
Markdown
# Event Lineage
|
|
|
|
## The Problem
|
|
|
|
A single client request to a BEDS application does not result in a single database operation. It fans out. A request to update a user record might trigger:
|
|
|
|
- The primary record update (REL write)
|
|
- An audit record insert (REC write)
|
|
- A journal entry (REC write)
|
|
- Three log events (published to admin)
|
|
- A cache invalidation event (ADM event)
|
|
|
|
That is six database operations from one client request. In production at Giving Assistant, the fanout was often into dozens of concurrent operations per request.
|
|
|
|
When something goes wrong, the question is: **what did request X actually cause?** Without event lineage, the answer requires correlating timestamps across multiple collections and hoping nothing else happened at the same moment.
|
|
|
|
## The Solution: Compound Event IDs
|
|
|
|
Every BEDS event carries three lineage fields:
|
|
|
|
```
|
|
event_id = "{node}.{env}.{guid}"
|
|
parent_id = "" # empty string if root event
|
|
depth = 0 # integer — levels from root
|
|
```
|
|
|
|
### `event_id`
|
|
|
|
A compound identifier unique across the entire cluster:
|
|
|
|
```
|
|
ms.production.a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
|
│ │ │
|
|
│ │ └── UUID v4 — unique within this event
|
|
│ └── environment name from config
|
|
└── wbid — identifies the cluster
|
|
```
|
|
|
|
The compound format means two events with the same UUID from different clusters or environments never collide. This matters when you are aggregating logs from multiple environments.
|
|
|
|
### `parent_id`
|
|
|
|
The `event_id` of the event that spawned this one. Empty string for root events (direct client requests). All derived events (audit records, log entries, journal entries, cache events) carry the root event's `event_id` as their `parent_id`.
|
|
|
|
### `depth`
|
|
|
|
How many levels from the root event:
|
|
|
|
```
|
|
depth=0 root event (client request)
|
|
depth=1 direct children (first-generation derived events)
|
|
depth=2 grandchildren (events spawned by depth=1 events)
|
|
```
|
|
|
|
Depth is capped in practice — a correctly-designed BEDS application should not need depth beyond 3 or 4. Deep recursion is a design smell.
|
|
|
|
## Querying Event Trees
|
|
|
|
With these three fields, you can reconstruct the full tree of operations triggered by any event:
|
|
|
|
**Find the root event:**
|
|
```
|
|
event_id = "ms.production.a1b2c3d4..."
|
|
depth = 0
|
|
```
|
|
|
|
**Find all direct children:**
|
|
```
|
|
parent_id = "ms.production.a1b2c3d4..."
|
|
depth = 1
|
|
```
|
|
|
|
**Find the full subtree:**
|
|
```
|
|
parent_id = "ms.production.a1b2c3d4..." (all depths)
|
|
```
|
|
|
|
**Reconstruct the full tree:**
|
|
```
|
|
event_id = "ms.production.a1b2c3d4..." (root)
|
|
+ parent_id = "ms.production.a1b2c3d4..." (all children at any depth)
|
|
```
|
|
|
|
Both `event_id` and `parent_id` are indexed on the `msLogs` collection. The compound index `cIdx1Log = [event_id ASC, depth ASC]` is specifically designed for full tree traversal.
|
|
|
|
## Why Not Distributed Tracing?
|
|
|
|
Systems like Jaeger, Zipkin, and OpenTelemetry solve the same problem. BEDS does not use them. The reasons are deliberate:
|
|
|
|
1. **BEDS already has a structured event store.** MongoDB `msLogs` is queryable, indexed, and retains data as long as the TTL allows. A separate tracing system would duplicate this data.
|
|
|
|
2. **Simplicity.** Adding a distributed tracing system adds operational complexity — another service to run, monitor, and maintain. BEDS event lineage is built into the data model and requires no additional infrastructure.
|
|
|
|
3. **Self-sufficiency.** BEDS is designed to run in environments that may not have cloud infrastructure available. A homelab running BEDS should be able to answer "what happened?" without an external observability platform.
|
|
|
|
The tradeoff is that BEDS event lineage is specific to BEDS events. It does not cover external HTTP calls or third-party service interactions. If those are important to observe, a lightweight OpenTelemetry integration could be added to the adapter layer without changing the lineage model.
|
|
|
|
## The `msLogs` Collection
|
|
|
|
Log events written by the admin node carry full lineage. The logger template (`mst_logger_rec.toml`) defines the schema:
|
|
|
|
| Field | Type | Purpose |
|
|
|---|---|---|
|
|
| `event_id` | string | compound event ID of this log event |
|
|
| `parent_id` | string | parent event ID — empty for root events |
|
|
| `depth` | integer | levels from root |
|
|
| `level_log` | string | debug \| data \| info \| error \| warning \| fatal \| timer \| event |
|
|
| `level_val` | integer | -1 through 7 — enables range queries by severity |
|
|
| `resource` | string | 4-char component tag (e.g. LOGR, AMQP, CNFG) |
|
|
| `service_log` | string | node role that issued the event |
|
|
| `env_log` | string | environment |
|
|
| `node_log` | string | node name from config |
|
|
| `file_log` | string | source file |
|
|
| `method_log` | string | calling function name |
|
|
| `line_log` | integer | source line number |
|
|
| `message_log` | string | the log message |
|
|
| `trace_log` | array | stack trace — empty unless trace=true |
|
|
| `created` | integer | epoch timestamp |
|
|
|
|
The `level_val` integer enables range queries that are impossible with string level names:
|
|
|
|
```
|
|
level_val >= 4 # warning and above
|
|
level_val == 6 # fatal only
|
|
level_val <= 1 # debug and data
|
|
```
|
|
|
|
## Console Output Format
|
|
|
|
For local console output (before AMQP is up, or when `syslog=false`), BEDS follows the format established in the PHP `consoleLog` function:
|
|
|
|
```
|
|
[dd/mm/yy@HH:MM:SS] [LVL]RESRC: message
|
|
```
|
|
|
|
Example:
|
|
```
|
|
[04/04/26@14:23:01] [ I]BEDS: BEDS IPL starting, node=ms env=production
|
|
[04/04/26@14:23:01] [ I]BEDS: Configuration loaded
|
|
[04/04/26@14:23:01] [ I]AMQP: RabbitMQ reachable
|
|
[04/04/26@14:23:01] [ W]MNGO: MongoDB unreachable (non-fatal in development): connection refused
|
|
```
|
|
|
|
The level tag is right-padded to 2 characters in brackets. The resource tag is 4 characters. This format was chosen because it is immediately scannable — level and source are visible without reading the message text.
|