Why invalidation becomes difficult once cached answers depend on multiple keys, events, representations, and partial failures.
Invalidation is hard because cached answers rarely depend on just one thing. A page may depend on several entities. A query result may depend on many rows. A summary cache may depend on multiple event streams. A user-visible response may depend on permissions, locale, feature flags, and representation version. Once those dependencies multiply, knowing exactly what must stop being trusted becomes a graph problem rather than a single-key problem.
The other reason it is hard is timing. Updates, events, and cache changes do not happen atomically across the whole system. A source write may succeed while an invalidation event is delayed. One cache layer may refresh while another still serves older content. The “hard part of caching” is therefore not mysterious. It is the combination of dependency fan-out and non-instant propagation.
flowchart TD
A["Source change"] --> B["Event emitted"]
B --> C["Cache layer 1 updated"]
B --> D["Cache layer 2 updated"]
B --> E["Derived result refreshed"]
F["Delay or failure at any step"] -.-> C
F -.-> D
F -.-> E
This matters because teams often underestimate invalidation by looking only at the happy path. They think: “on update, delete these keys.” Production reality is messier:
That is why invalidation problems often appear as rare, confusing correctness bugs rather than obvious functional failures.
If the system does not know what a cached answer depends on, it cannot invalidate correctly. Hidden dependencies often arise from:
The cache may seem fine until one of those hidden inputs changes.
Even if the dependency graph is correct, invalidation still has to succeed operationally. A message broker can lag. A purge request can fail. A background refresh job can crash. The system may then enter a mixed state where some layers are fresh and others are not.
This is why invalidation should be thought of as a distributed workflow, not as one local code branch.
Why do invalidation bugs often feel intermittent and hard to reproduce?
The stronger answer is that they are usually dependency-and-timing bugs. They depend on which layer had the stale answer, which invalidation step lagged or failed, and whether the wrong dependency was even modeled in the first place.