Decision matrix for matching event-driven patterns to coordination, delivery, and scaling needs.
This appendix is a decision aid for pattern choice. It does not replace the chapter lessons, because pattern selection always depends on trade-offs. What it does provide is a fast way to move from a system problem to the small set of patterns most likely to fit, along with the warning signs that should slow the decision down.
The strongest way to use this matrix is to start from the actual problem shape, not from the pattern you already want to use. Many event-driven mistakes happen because teams begin with “we should use events here” rather than with “what dependency, failure, or scaling problem are we actually solving?”
flowchart TD
A["Start with the problem"] --> B{"Is the problem fan-out, work distribution, reliability, workflow, read modeling, analytics, or governance?"}
B --> C["Select a candidate pattern family"]
C --> D["Check trade-offs and weak-fit signals"]
D --> E["Choose the narrowest pattern that solves the real problem"]
What to notice:
| Problem Shape | Strong Candidate Pattern | Why It Fits | What to Watch | Weak-Fit Signal |
|---|---|---|---|---|
| One business fact should trigger many independent downstream reactions | Publish/subscribe | It lets several consumers react independently to the same fact | Schema stability and fan-out governance | Consumers are not independent and actually need work distribution or a reply |
| Background work needs to be spread across workers | Work queue or competing consumers | It distributes units of work across worker instances efficiently | Idempotency, retries, and hot partitions | Several consumers all need the same fact rather than one of them doing the work |
| State change and event publication can drift apart | Transactional outbox | It aligns local commit with later safe publication | Relay monitoring and duplicate publish handling | The team publishes “after commit” and hopes the broker call never fails |
| Consumers keep making repeated callbacks to the source service for the same details | Event-carried state transfer | It improves downstream autonomy by carrying useful state once | Payload growth and contract pressure | The event grows into a producer-internal dump rather than a useful domain contract |
| One system still needs a response, but messaging transport is already central | Correlated request/reply | It preserves a request-response shape over asynchronous transport | Timeouts, late replies, and correlation state | It is being used to hide normal low-latency RPC for almost every interaction |
| A long-running workflow spans several local transactions | Saga with choreography or orchestration | It models progress and recovery across distributed business steps | Compensation design and visibility | The team only describes the happy path and has no failure or compensation model |
| Read concerns need different shapes from write-side correctness concerns | Projections or CQRS-style read models | They let the read side optimize for query usefulness | Replay safety and eventual consistency | The read model is quietly being treated as the source of truth |
| Rolling metrics, joins, or near-real-time insight are needed | Stream processing with windows | It supports continuous computation over live event flow | Event-time modeling, state, and backpressure | The problem is simple reporting that a batch job or projection could solve more safely |
| A few bad events should not block normal processing | Dead-letter queue or quarantine path | It isolates repeated failures from the main path | Ownership, diagnosis, and replay discipline | The DLQ is treated as a trash bin with no review model |
| Contract changes risk breaking unknown or lagging consumers | Schema registry and compatibility checks | They make ownership and safe evolution explicit | Overly casual renames and semantic drift | The team still treats event changes like internal refactors |
| Tenant or data-sensitivity boundaries must be enforced | Scoped ACLs, stream governance, and tenant-isolation design | They keep shared event platforms from becoming leakage surfaces | Over-broad consume rights and shared-tooling risk | Tenant ID exists in payloads but no real isolation controls exist |
Some pattern choices repeat so often that short decision rules help.
Use publish/subscribe when one fact should reach many independent consumers. Use a queue or competing-consumer model when one work item should be handled by one worker from a pool. Use event-carried state transfer when callback-heavy notification is preserving runtime coupling. Use correlated request/reply only when a response relationship is truly required and broker-mediated interaction is still useful. Use projections when read shape and read latency matter more than reusing the write model directly. Use a saga only when one business process really spans several local commits and failure recovery must be explicit. Use stream processing when the problem is continuous analytics or rolling computation, not just asynchronous integration.
Some of the strongest event-driven designs combine patterns instead of treating them as mutually exclusive.
The key is not to combine patterns for sophistication. Combine them only when each one closes a different real risk.
Before locking in a pattern, ask:
These questions often eliminate weak-fit patterns quickly.
A team says they want one standard solution for all integration, so they plan to use correlated request/reply over the broker for user notifications, batch work dispatch, search aggregation, and normal service reads. What is the strongest challenge?
The strongest challenge is that transport standardization is being mistaken for architectural fit. Those workloads have different dependency shapes. Some need publish/subscribe, some need work distribution, some may need request/reply, and some may still be ordinary RPC. One pattern for everything usually means the actual problem shapes are being ignored.