Promote service modules to services/ directory; add AmqpConnection + async IPL

- Flat src/amqp.rs, src/mongo.rs, src/mariadb.rs promoted to src/services/{amqp,mongo,mariadb}/ - services/amqp/connection.rs: AmqpConnection struct with connect() and declare_exchange() - services/amqp/error.rs: AmqpError type (thiserror, wraps lapin::Error) - ipl() made async; #[tokio::main] added to main() - IPL step 3b: authenticate to RabbitMQ + declare beds.events topic exchange (durable) - Added lapin = "2" and tokio = { version = "1", features = ["full"] } to Cargo.toml - 12 unit tests pass - Docs: README, CLAUDE.md, wiki/04-ipl.md, wiki/06-queue-topology.md updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:52:18 -07:00
parent 2a9afe7d77
commit e8fdb39ea2
14 changed files with 1419 additions and 126 deletions
--- a/wiki/04-ipl.md
+++ b/wiki/04-ipl.md
@@ -42,13 +42,13 @@ Initializes the `tracing` subscriber with journald and/or console output based o

 **Note on log routing:** At this point, log output goes to the local console and/or journald. Log events are not yet routed to the admin node's MongoDB `msLogs` collection — that requires RabbitMQ to be up (Step 3). Local logging is the fallback that covers the gap between process start and AMQP connectivity.

-### Step 3: Validate RabbitMQ
+### Step 3: Validate RabbitMQ Reachability (TCP)

 ```rust
-match amqp::validate(&cfg.broker_services) { ... }
+match services::amqp::validate(&cfg.broker_services) { ... }
 ```

-Opens a TCP connection to the configured RabbitMQ broker host and port. Does not authenticate or open an AMQP channel — reachability only. The connection is immediately closed.
+Opens a TCP connection to the configured RabbitMQ broker host and port. Does not authenticate or open an AMQP channel — reachability only. The connection is immediately closed. This is a fast pre-flight check before the more expensive authentication step.

 **Why RabbitMQ first among services:** RabbitMQ is the transport for all inter-node communication, including log event routing. If RabbitMQ is unreachable, the node cannot communicate with the rest of the cluster at all. It cannot send logs to admin, receive work events, or return results. Validating it before other services establishes that the backbone is up.

@@ -56,6 +56,24 @@ Opens a TCP connection to the configured RabbitMQ broker host and port. Does not
 - `production`: unreachable broker is fatal — the node cannot function
 - all other environments: unreachable broker is a warning — IPL continues so developers can work on other components without a running broker

+### Step 3b: Authenticate to RabbitMQ + Declare Exchange
+
+```rust
+let amqp_conn = match services::amqp::AmqpConnection::connect(&cfg.broker_services).await { ... }
+```
+
+Opens a full AMQP session — credentials, vhost, and channel. Then asserts the `beds.events` topic exchange as durable. The exchange declaration is idempotent: if the exchange already exists with matching parameters, RabbitMQ returns success; if it exists with conflicting parameters, it returns an error.
+
+`AmqpConnection` holds the live `lapin::Connection` and `lapin::Channel` for the session. The connection is kept as a field to prevent early drop — if the connection is dropped while the channel is live, the channel closes.
+
+**Why declare the exchange at IPL?** The exchange is the single shared routing infrastructure for the entire cluster. Every node that publishes events depends on it. Declaring it idempotently at startup ensures it always exists before any broker task tries to publish. The first node to start creates it; every subsequent node confirms it.
+
+**Queue declaration is not IPL's job.** Queues are declared by broker tasks when they start — not here. A queue's presence signals that the broker handling it is alive and ready to consume. IPL only asserts the exchange.
+
+**Environment-aware failure handling:**
+- `production`: authentication failure is fatal
+- all other environments: failure is a warning — `amqp_conn` is `None`, IPL continues
+
 ### Step 4: Validate MongoDB

 ```rust
@@ -80,11 +98,7 @@ Opens a TCP connection to the master instance of each configured MariaDB node. T

 **Environment-aware failure handling:** Master failure is fatal in production, warning in development. Secondary failure is a warning in all environments.

-### Step N (not yet implemented): Shared Filesystem Validation
-
-Validates that the configured shared filesystem path (`/dev/shm` or equivalent) exists and is writable. Used for inter-process communication and temporary file operations.
-
-### Step N+1 (not yet implemented): Node Self-Identification
+### Step N (not yet implemented): Node Self-Identification

 The node writes its identity record — role, capabilities, env, timestamp — to the `msNodes` collection. This enables topology visibility for operations tooling. It is not a dependency for the core data path.

@@ -120,29 +134,31 @@ This allows a developer to work on, say, the MariaDB adapter without needing a r

 ## The `ipl()` Function

-`ipl()` lives in `src/main.rs`. It returns `Result<(), String>`. Errors are plain strings — the IPL failure message is written directly to stderr with `eprintln!` before `process::exit(1)`, because at the point of a fatal IPL failure, the logging system may not be fully operational.
+`ipl()` lives in `src/main.rs`. It is `async` — required because the AMQP authentication step (`lapin::Connection::connect`) is an async operation. The Tokio runtime is started by the `#[tokio::main]` attribute on `main()`.
+
+`ipl()` returns `Result<(), String>`. Errors are plain strings — the IPL failure message is written directly to stderr with `eprintln!` before `process::exit(1)`, because at the point of a fatal IPL failure, the logging system may not be fully operational.

 `main()` is intentionally minimal:

 ```rust
-fn main() {
-    if let Err(e) = ipl() {
+#[tokio::main]
+async fn main() {
+    if let Err(e) = ipl().await {
        eprintln!("[BEDS] [FATAL] [IPL] {}", e);
        std::process::exit(1);
    }
 }
 ```

-All logic is in `ipl()`. `main()` exists only to handle the fatal exit path.
+All logic is in `ipl()`. `main()` exists only to start the runtime and handle the fatal exit path.

 ## Future IPL Steps

 As BEDS matures, the IPL sequence will grow. Expected additions in order:

-1. Shared filesystem validation
-2. Node role determination (which services are `is_local`)
-3. Broker pool startup (spawn Tokio tasks per broker type)
-4. Queue and exchange declaration (assert topology on RabbitMQ)
-5. Node self-identification (write identity record to MongoDB)
-6. Signal handler registration (SIGTERM, SIGINT for graceful shutdown)
-7. Node green — begin processing events
+1. Node role determination (which services are `is_local`)
+2. Broker pool startup (spawn Tokio tasks per broker type)
+3. Queue declaration (each broker task declares its own queue on start)
+4. Node self-identification (write identity record to MongoDB)
+5. Signal handler registration (SIGTERM, SIGINT for graceful shutdown)
+6. Node green — begin processing events
--- a/wiki/06-queue-topology.md
+++ b/wiki/06-queue-topology.md
@@ -120,6 +120,16 @@ rec.*   matches rec.read, rec.write, rec.obj

 In the current implementation, brokers bind to specific queues. As the framework grows, the topic exchange flexibility will be used for cross-cutting concerns (audit, metrics) that need visibility across multiple event types without duplicating event payloads.

+## Queue Declaration Lifecycle
+
+The `beds.events` exchange is declared during IPL (Step 3b), before any broker task starts. This ensures the routing infrastructure exists before anyone tries to publish to it.
+
+**Queues are not declared during IPL.** Each broker task declares its own queue when it starts. This is a deliberate design choice:
+
+- **Queue presence = service ready.** A queue's existence on the broker signals that the task consuming it is alive and ready to process messages. A queue declared at IPL before the consumer starts would be misleading — messages could arrive before the consumer is ready, or worse, before it is confirmed the consumer will start at all.
+- **No reserved global topology.** There is no fixed set of queues that must exist for the cluster to function. The topology emerges from the services that are actually running. An appServer with only rBroker and wBroker running has exactly those two queues — not the full topology diagram.
+- **Clean restarts.** When a broker task restarts, queue declaration is idempotent — RabbitMQ returns success if the queue already exists with matching parameters. Messages queued during the restart interval are waiting for the consumer when it comes back up.
+
 ## Queue Durability and Persistence

 All BEDS queues are: