How hit rate, miss penalty, and diminishing returns determine whether a cache is actually earning its keep.
A cache hit is cheap. A cache miss is where the real economics show up. The value of a cache depends on how often it avoids expensive work and how painful misses are when they do happen. This is why hit rate is important but incomplete. A 90 percent hit rate can be excellent or disappointing depending on miss penalty, request volume, and how the remaining misses cluster in time.
The other important concept is diminishing returns. Moving hit rate from 0 to 60 percent is usually transformative. Moving from 96 to 97 percent may still matter, but only if the remaining misses are expensive enough, bursty enough, or operationally dangerous enough to justify more complexity.
flowchart LR
A["Requests"] --> B{"Hit or miss?"}
B -->|Hit| C["Cheap latency\nlow origin load"]
B -->|Miss| D["Origin read or recompute"]
D --> E["Miss penalty"]
E --> F["Extra latency\nextra backend cost\nextra failure exposure"]
Teams often celebrate hit rate in isolation, which is a mistake. A high hit rate can still leave a system fragile if:
The right economic view is therefore a curve, not a badge. You want to know how much each additional hit-rate improvement buys and what it costs to achieve.
Hit rate answers a useful question: what fraction of requests avoided the slow path? But it does not answer several other important questions:
This is why strong cache dashboards usually pair hit rate with miss rate, miss penalty, origin request count, tail latency, and stale-read indicators.
A miss that adds 5 ms is different from a miss that triggers 200 ms of database work or a 2-second third-party API call. The more expensive the miss, the more each additional avoided miss is worth. This is also why caches in front of remote APIs or heavyweight report generation can pay off even with hit rates that would look mediocre in another context.
This TypeScript snippet models backend load at different hit rates. The numbers are simple, but they show why the curve is nonlinear from the origin’s point of view.
1function remainingOriginQps(totalQps: number, hitRate: number): number {
2 return totalQps * (1 - hitRate);
3}
4
5for (const hitRate of [0.0, 0.5, 0.8, 0.9, 0.96]) {
6 console.log({
7 hitRate,
8 originQps: remainingOriginQps(10000, hitRate)
9 });
10}
Representative output:
1{ hitRate: 0, originQps: 10000 }
2{ hitRate: 0.5, originQps: 5000 }
3{ hitRate: 0.8, originQps: 2000 }
4{ hitRate: 0.9, originQps: 1000 }
5{ hitRate: 0.96, originQps: 400 }
What to notice:
Improving hit rate is not free. It may require:
That means every incremental improvement should be compared against what else that effort could buy. Sometimes moving from 92 to 95 percent is worth it. Sometimes fixing one expensive query, reducing payload size, or collapsing a chatty call chain is a better investment.
If one cache has a 70 percent hit rate in front of a 2-second third-party API and another has a 95 percent hit rate in front of a fast internal lookup, which one may be economically more valuable?
The stronger answer is often the first one. Lower hit rate does not automatically mean lower value. If each avoided miss saves a very expensive remote call, the cache may still be delivering more benefit than a higher-hit cache with a tiny miss penalty.