Synchronized expiry, broad purges, and miss waves that overload databases and downstream services.
Thundering herds are broader than a single-key stampede. They happen when many clients, instances, or keys become cold together and hit the same backend systems at once. A synchronized TTL boundary, a broad purge, or a failed cache node can create a wave of misses that ripples into databases, search clusters, third-party APIs, and internal services.
This is where caching becomes a systems problem instead of a key problem. The issue is no longer only whether one object is fresh. It is whether the overall miss pattern can destabilize the rest of the architecture.
flowchart TD
A["Shared cache node fails"] --> B["Application fleet misses"]
B --> C["Origin API spike"]
C --> D["Database spike"]
C --> E["Search spike"]
D --> F["Higher latency"]
E --> F
F --> G["More requests overlap and retry"]
A system can survive some stale data more easily than it can survive cascading overload. Once a herd starts, latency grows, retries overlap, and each extra second of origin delay keeps more concurrent readers in flight. The cache miss pattern becomes the cause of a wider incident.
Typical triggers include:
Herd prevention usually requires more than one tactic:
The important idea is that backend protection belongs inside the cache strategy, not only inside the database or API tier.
This policy example shows a layered protection model for hot keys.
1herd_control:
2 ttl_jitter_percent: 15
3 max_parallel_rebuilds_per_key: 1
4 max_parallel_origin_reads_global: 50
5 stale_if_origin_slow_ms: 250
6 prewarm:
7 - homepage:top-products
8 - category:laptops
9 - pricing:public-plans
What to notice:
Protecting the backend changes the failure shape.
Good herd control is not just one lock or one TTL tweak. It is a layered resilience posture.
Why is a broad purge often more dangerous than a single stale key?
The stronger answer is that a stale key usually harms one answer, while a broad purge changes system-wide traffic shape. If many readers refill at once, the incident can spread from the cache layer into every downstream dependency and become an availability problem rather than a local freshness problem.