Latency, Throughput, and Distance

Why distance, queueing, and repeated backend work make caches valuable on real request paths.

Caching helps because distance is expensive. A value already in local memory, a nearby process cache, or a close distributed cache is usually much cheaper to reach than a remote database, another region, or a chain of downstream services. Every extra hop adds delay, but it also adds contention. One slow call is just latency. Thousands of identical slow calls become a throughput and capacity problem.

That is why caching changes more than response time. It changes how often the slow path is exercised at all. A high-hit cache removes repeated trips to the expensive layer, which lowers average latency, improves tail behavior, and gives the origin more room to handle writes or genuinely uncached work.

    sequenceDiagram
	    participant U as User Request
	    participant A as App
	    participant C as Cache
	    participant O as Origin Store
	
	    U->>A: GET /product/42
	    A->>C: lookup key
	    alt cache hit
	        C-->>A: value
	        A-->>U: fast response
	    else cache miss
	        C-->>A: miss
	        A->>O: read authoritative value
	        O-->>A: value
	        A->>C: store reusable result
	        A-->>U: slower response
	    end

Why It Matters

A lot of performance tuning fails because teams optimize the wrong layer. They stare at database query time while ignoring network distance, cross-service fan-out, connection pool saturation, or queueing delay under load. Caching is most effective when the expensive part of the request path is repeated and avoidable.

The useful mental model is:

  • latency is the cost one request sees
  • throughput is how many requests the system can keep serving
  • distance increases both because every remote operation consumes time and scarce backend capacity

Latency Is Not Just One Number

A request path often contains several kinds of delay:

  • serialization and deserialization time
  • network round trips
  • queueing while waiting for a worker, connection, or thread
  • execution time inside the backend itself
  • additional retries or fan-out calls after the first remote hop

Caching does not remove every component, but it can bypass the most repeated expensive segment. That is often enough to change the user experience substantially.

Throughput Improves When Repeated Work Disappears

Throughput is not only about faster CPUs. It is about how much avoidable work the backend must still perform. If 5,000 requests per second all ask for the same product metadata and 95 percent are served from cache, then the origin sees 250 requests per second instead of 5,000. That changes scaling behavior, connection pressure, and failure probability.

This is one reason cache misses matter so much. A single miss is slow for one caller. A synchronized wave of misses can destabilize the system.

Example

This small TypeScript function estimates expected request latency when a cache sits in front of an origin. It is intentionally simple, but it makes the economic point clearly.

 1function expectedLatencyMs(
 2  hitRate: number,
 3  cacheLatencyMs: number,
 4  originLatencyMs: number
 5): number {
 6  const missRate = 1 - hitRate;
 7  return hitRate * cacheLatencyMs + missRate * (cacheLatencyMs + originLatencyMs);
 8}
 9
10console.log(expectedLatencyMs(0.95, 2, 80)); // 6 ms
11console.log(expectedLatencyMs(0.50, 2, 80)); // 42 ms

What to notice:

  • even modest miss penalties dominate average latency quickly
  • a cache with a mediocre hit rate may still leave a path too expensive
  • the remote origin cost matters more than the cache cost once misses are frequent

Common Mistakes

  • focusing only on average latency and ignoring miss storms
  • assuming network distance is cheap because individual calls “look fast enough”
  • adding a cache in front of a path whose real problem is write contention or poor ownership
  • measuring origin query time without measuring queueing and fan-out overhead

Design Review Question

Your team wants to add a cache because the endpoint is slow. What should they identify before choosing a cache product?

The stronger answer is the exact expensive segment being avoided: database reads, cross-region latency, repeated third-party API calls, rendered page assembly, or something else. Without naming the slow repeated work, the cache decision is still too vague.

Quiz Time

Loading quiz…
Revised on Thursday, April 23, 2026