1
09 event lineage
gramps edited this page 2026-04-04 20:48:55 -07:00

Event Lineage

The Problem

A single client request to a BEDS application does not result in a single database operation. It fans out. A request to update a user record might trigger:

  • The primary record update (REL write)
  • An audit record insert (REC write)
  • A journal entry (REC write)
  • Three log events (published to admin)
  • A cache invalidation event (ADM event)

That is six database operations from one client request. In production at Giving Assistant, the fanout was often into dozens of concurrent operations per request.

When something goes wrong, the question is: what did request X actually cause? Without event lineage, the answer requires correlating timestamps across multiple collections and hoping nothing else happened at the same moment.

The Solution: Compound Event IDs

Every BEDS event carries three lineage fields:

event_id  = "{node}.{env}.{guid}"
parent_id = ""                      # empty string if root event
depth     = 0                       # integer — levels from root

event_id

A compound identifier unique across the entire cluster:

ms.production.a1b2c3d4-e5f6-7890-abcd-ef1234567890
│   │          │
│   │          └── UUID v4 — unique within this event
│   └── environment name from config
└── wbid — identifies the cluster

The compound format means two events with the same UUID from different clusters or environments never collide. This matters when you are aggregating logs from multiple environments.

parent_id

The event_id of the event that spawned this one. Empty string for root events (direct client requests). All derived events (audit records, log entries, journal entries, cache events) carry the root event's event_id as their parent_id.

depth

How many levels from the root event:

depth=0   root event (client request)
depth=1   direct children (first-generation derived events)
depth=2   grandchildren (events spawned by depth=1 events)

Depth is capped in practice — a correctly-designed BEDS application should not need depth beyond 3 or 4. Deep recursion is a design smell.

Querying Event Trees

With these three fields, you can reconstruct the full tree of operations triggered by any event:

Find the root event:

event_id = "ms.production.a1b2c3d4..."
depth = 0

Find all direct children:

parent_id = "ms.production.a1b2c3d4..."
depth = 1

Find the full subtree:

parent_id = "ms.production.a1b2c3d4..."   (all depths)

Reconstruct the full tree:

event_id = "ms.production.a1b2c3d4..."   (root)
  + parent_id = "ms.production.a1b2c3d4..."   (all children at any depth)

Both event_id and parent_id are indexed on the msLogs collection. The compound index cIdx1Log = [event_id ASC, depth ASC] is specifically designed for full tree traversal.

Why Not Distributed Tracing?

Systems like Jaeger, Zipkin, and OpenTelemetry solve the same problem. BEDS does not use them. The reasons are deliberate:

  1. BEDS already has a structured event store. MongoDB msLogs is queryable, indexed, and retains data as long as the TTL allows. A separate tracing system would duplicate this data.

  2. Simplicity. Adding a distributed tracing system adds operational complexity — another service to run, monitor, and maintain. BEDS event lineage is built into the data model and requires no additional infrastructure.

  3. Self-sufficiency. BEDS is designed to run in environments that may not have cloud infrastructure available. A homelab running BEDS should be able to answer "what happened?" without an external observability platform.

The tradeoff is that BEDS event lineage is specific to BEDS events. It does not cover external HTTP calls or third-party service interactions. If those are important to observe, a lightweight OpenTelemetry integration could be added to the adapter layer without changing the lineage model.

The msLogs Collection

Log events written by the admin node carry full lineage. The logger template (mst_logger_rec.toml) defines the schema:

Field Type Purpose
event_id string compound event ID of this log event
parent_id string parent event ID — empty for root events
depth integer levels from root
level_log string debug | data | info | error | warning | fatal | timer | event
level_val integer -1 through 7 — enables range queries by severity
resource string 4-char component tag (e.g. LOGR, AMQP, CNFG)
service_log string node role that issued the event
env_log string environment
node_log string node name from config
file_log string source file
method_log string calling function name
line_log integer source line number
message_log string the log message
trace_log array stack trace — empty unless trace=true
created integer epoch timestamp

The level_val integer enables range queries that are impossible with string level names:

level_val >= 4   # warning and above
level_val == 6   # fatal only
level_val <= 1   # debug and data

Console Output Format

For local console output (before AMQP is up, or when syslog=false), BEDS follows the format established in the PHP consoleLog function:

[dd/mm/yy@HH:MM:SS] [LVL]RESRC: message

Example:

[04/04/26@14:23:01] [ I]BEDS: BEDS IPL starting, node=ms env=production
[04/04/26@14:23:01] [ I]BEDS: Configuration loaded
[04/04/26@14:23:01] [ I]AMQP: RabbitMQ reachable
[04/04/26@14:23:01] [ W]MNGO: MongoDB unreachable (non-fatal in development): connection refused

The level tag is right-padded to 2 characters in brackets. The resource tag is 4 characters. This format was chosen because it is immediately scannable — level and source are visible without reading the message text.