Cache Hits, Cache Misses, and Cost Curves

March 26, 2026

How hit rate, miss penalty, and diminishing returns determine whether a cache is actually earning its keep.

A cache hit is cheap. A cache miss is where the real economics show up. The value of a cache depends on how often it avoids expensive work and how painful misses are when they do happen. This is why hit rate is important but incomplete. A 90 percent hit rate can be excellent or disappointing depending on miss penalty, request volume, and how the remaining misses cluster in time.

The other important concept is diminishing returns. Moving hit rate from 0 to 60 percent is usually transformative. Moving from 96 to 97 percent may still matter, but only if the remaining misses are expensive enough, bursty enough, or operationally dangerous enough to justify more complexity.

    flowchart LR
	    A["Requests"] --> B{"Hit or miss?"}
	    B -->|Hit| C["Cheap latency\nlow origin load"]
	    B -->|Miss| D["Origin read or recompute"]
	    D --> E["Miss penalty"]
	    E --> F["Extra latency\nextra backend cost\nextra failure exposure"]

Why It Matters

Teams often celebrate hit rate in isolation, which is a mistake. A high hit rate can still leave a system fragile if:

misses are extremely expensive
misses arrive in synchronized bursts
the most business-critical keys miss at the worst times
the cache adds enough operational complexity to offset its savings

The right economic view is therefore a curve, not a badge. You want to know how much each additional hit-rate improvement buys and what it costs to achieve.

Hit Rate Is Necessary But Not Sufficient

Hit rate answers a useful question: what fraction of requests avoided the slow path? But it does not answer several other important questions:

how expensive was each miss?
which keys missed?
how much origin load remained?
how stale were the hits that did serve?

This is why strong cache dashboards usually pair hit rate with miss rate, miss penalty, origin request count, tail latency, and stale-read indicators.

Miss Penalty Changes Everything

A miss that adds 5 ms is different from a miss that triggers 200 ms of database work or a 2-second third-party API call. The more expensive the miss, the more each additional avoided miss is worth. This is also why caches in front of remote APIs or heavyweight report generation can pay off even with hit rates that would look mediocre in another context.

Example

This TypeScript snippet models backend load at different hit rates. The numbers are simple, but they show why the curve is nonlinear from the origin’s point of view.

 1function remainingOriginQps(totalQps: number, hitRate: number): number {
 2  return totalQps * (1 - hitRate);
 3}
 4
 5for (const hitRate of [0.0, 0.5, 0.8, 0.9, 0.96]) {
 6  console.log({
 7    hitRate,
 8    originQps: remainingOriginQps(10000, hitRate)
 9  });
10}

Representative output:

1{ hitRate: 0, originQps: 10000 }
2{ hitRate: 0.5, originQps: 5000 }
3{ hitRate: 0.8, originQps: 2000 }
4{ hitRate: 0.9, originQps: 1000 }
5{ hitRate: 0.96, originQps: 400 }

What to notice:

the first gains in hit rate often remove the most origin traffic
later gains may still matter when the remaining origin path is fragile or costly
“good” hit rate is workload-specific, not universal

Cost Curves And Diminishing Returns

Improving hit rate is not free. It may require:

more memory
more complex key design
prewarming logic
extra invalidation plumbing
wider observability and tuning effort

That means every incremental improvement should be compared against what else that effort could buy. Sometimes moving from 92 to 95 percent is worth it. Sometimes fixing one expensive query, reducing payload size, or collapsing a chatty call chain is a better investment.

Common Mistakes

treating hit rate as the only performance metric that matters
assuming a single target hit rate works for every cache
ignoring miss bursts during expiration or deploys
optimizing for average cost while critical keys still miss too often

Design Review Question

If one cache has a 70 percent hit rate in front of a 2-second third-party API and another has a 95 percent hit rate in front of a fast internal lookup, which one may be economically more valuable?

The stronger answer is often the first one. Lower hit rate does not automatically mean lower value. If each avoided miss saves a very expensive remote call, the cache may still be delivering more benefit than a higher-hit cache with a tiny miss penalty.

Quiz Time

Loading quiz…

Revised on Wednesday, June 3, 2026

2.2 Hot Data, Locality, and Reuse

2.4 When Caching Does Not Pay Off