Combining browser, edge, service, and data-layer caches without turning freshness into guesswork.
Multi-layer cache stacks appear when several caching layers sit on the same request path: browser cache, CDN cache, reverse proxy, application cache, shared distributed cache, and sometimes database or query caches underneath. This can produce excellent latency and origin offload, but it also means each answer may have passed through several trust boundaries before reaching the user.
The main design challenge is not simply “use more caches.” It is assigning each layer a job. Some layers should absorb public traffic. Some should protect the application tier. Some should memoize expensive internal work. When every layer tries to do everything, invalidation becomes opaque and debugging gets much harder.
flowchart LR
A["Browser cache"] --> B["CDN / edge cache"]
B --> C["Reverse proxy cache"]
C --> D["Application instance cache"]
D --> E["Shared distributed cache"]
E --> F["Database or origin service"]
Layering changes both performance and failure shape. A miss at one layer is not a total miss if another layer below can still serve quickly. But each extra layer adds another place where stale data can survive longer than the team intended.
Multi-layer stacks are strongest when:
They become fragile when layers are duplicated accidentally or when no one can explain which layer is actually responsible for freshness.
A healthy layered design usually distinguishes between public and internal concerns.
Those layers should not all share the same TTL or invalidation strategy.
1cache_stack:
2 browser:
3 scope: per-user agent
4 ttl_seconds: 60
5 cdn:
6 scope: public edge
7 ttl_seconds: 300
8 app_local:
9 scope: per-instance hot data
10 ttl_seconds: 10
11 shared_cache:
12 scope: fleet-wide reuse
13 ttl_seconds: 120
This response strategy separates public edge caching from shorter-lived application-layer reuse.
1Cache-Control: public, max-age=60, stale-while-revalidate=30
2Surrogate-Control: max-age=300
3Vary: Accept-Encoding, Authorization
What to notice:
Vary rules become part of correctness, not just optimizationLayering usually improves latency and resilience, but it complicates reasoning.
The right multi-layer stack is usually asymmetric, not uniform. Different layers should intentionally trade off freshness, reach, and cost in different ways.
Vary and authorization boundaries must match all public cachesHow do you know whether an extra cache layer is helping rather than just obscuring freshness behavior?
The stronger answer is that each layer should have a distinct purpose, measurable hit behavior, and explicit freshness rules. If a layer exists only because “more caching sounds faster” and no one can explain its invalidation path, it is probably adding ambiguity rather than value.