CQRS and Event Sourcing in Banking
When to Use Them and How to Implement Correctly
When to Use Them and How to Implement Correctly
Banking systems carry a structural tension: the write side of a transaction demands strong consistency — a debit and credit must be atomic, ledger balances must never diverge — while the read side, used for statements, dashboards, fraud monitoring, and regulatory reporting, needs to scale independently and serve queries that have nothing to do with how the data was written. CQRS and event sourcing exist to resolve exactly this tension. Used well, they produce systems that are both transactionally sound and operationally scalable. Used without discipline, they produce distributed complexity that the team didn't need to take on.
CQRS separates the write model from the read model. The write side accepts commands — debit account, settle transaction — and enforces invariants synchronously: balance checks, fraud holds, regulatory limits. The read side serves queries from one or more denormalized projections, built independently of the write model's schema and scaled to match query load rather than transaction load.
Event sourcing complements this by making the write model an append-only log of events rather than a mutable current-state table. Instead of storing "account balance: €4,200," the system stores every event that produced that balance — FundsDeposited, FundsWithdrawn, HoldPlaced — and derives current state by replaying them. The audit trail isn't a separate logging concern bolted on afterward; it's the data model itself, which is precisely the property that makes this pattern attractive under DORA and PSD2 reporting obligations, where institutions must reconstruct the full history of a transaction's state, not just its final value.
In practice, the write side publishes domain events to a Kafka topic immediately after they're committed to the event store. Flink jobs consume that stream and build one or more read-optimized projections — an account-balance view, a daily-statement view, a fraud-scoring view — each shaped for its specific query pattern rather than forced into a single shared schema.
1 // Flink job sketch: building an account balance projection
2 events
3 .keyBy(event -> event.accountId)
4 .process(new KeyedProcessFunction<>() {
5 ValueState balance;
6
7 void processElement(AccountEvent event, Context ctx, Collector out) {
8 BigDecimal current = balance.value();
9 BigDecimal updated = applyEvent(current, event);
10 balance.update(updated);
11 out.collect(new BalanceView(event.accountId, updated, event.timestamp));
12 }
13 })
14 .sinkTo(balanceProjectionStore);
Each projection is disposable. If the projection logic is wrong, or a new read pattern emerges, the projection is dropped and rebuilt from the event log rather than migrated in place — a meaningful operational advantage over schema migrations on a live transactional table.
Projection rebuilds are routine in a mature CQRS implementation, not an exceptional event. A common approach: stand up the new projection version alongside the old one, replay the full event history from Kafka's compacted topic (or a cold-storage event archive for older history), and cut traffic over once the new projection has caught up to the live stream.
1 // Rebuild trigger
2 flinkJob.fromSavepoint(null) // start with empty state
3 .consume(eventTopic, fromOffset = EARLIEST)
4 .writeTo(newProjectionStore)
5 // cut over reads to newProjectionStore once lag == 0
This blue-green replay pattern avoids downtime and gives teams a clean rollback path: if the new projection has a defect, traffic simply routes back to the prior version while the event log — the actual source of truth — remains untouched.
The hard part is rarely the architecture; it's the user experience implication. A customer who submits a transfer and immediately checks their balance may hit a projection that hasn't caught up yet. For most reporting and analytics use cases, seconds of lag is irrelevant. For balance display immediately after a write, it isn't.
Teams handle this with a few established patterns: read-your-own-writes by serving the post-command response from the command-side state rather than waiting on the projection, optimistic UI updates that reflect the expected post-transaction state immediately, or a short synchronous wait with a fallback to "processing" status if the projection lag exceeds a threshold. The choice depends on which flows are consistency-sensitive — balance checks and transaction confirmations usually are; historical statements and dashboards usually aren't.
CQRS and event sourcing earn their complexity where audit requirements are non-negotiable, where read and write loads genuinely diverge in scale, and where multiple, differently-shaped views of the same data are a permanent business requirement — core ledgers, payment processing, transaction monitoring. These are domains where the operational cost of running an event log, projection infrastructure, and rebuild tooling is paid back in regulatory defensibility and scalability.
It over-complicates domains that don't have this profile: internal admin tools, low-volume configuration data, services where a single well-indexed relational table already serves every read pattern the business needs. The pattern is frequently adopted because it's architecturally interesting, not because the system's actual consistency and scale requirements call for it — and the resulting maintenance burden falls on whichever team inherits the system next.
The right call depends less on the pattern's theoretical merits and more on a clear-eyed read of where a specific institution's read and write paths actually diverge — and where they don't.