External API and Service Caching

March 26, 2026

Caching third-party and cross-service responses to reduce latency, cost, and dependency pressure without violating data contracts.

Caching external API and service calls is often economically compelling because the avoided work is not just compute. It is network distance, rate-limit pressure, third-party billing, and dependency fragility. A short-lived cache in front of an upstream call can significantly improve resilience and cost posture.

The challenge is that the service contract may be owned by someone else. Freshness, error semantics, and legal or product expectations may not tolerate casual reuse. A cached exchange rate, shipping quote, fraud signal, or entitlement check can be valuable, but only if the staleness budget is explicitly acceptable.

    sequenceDiagram
	    participant App
	    participant Cache
	    participant Upstream
	
	    App->>Cache: lookup request signature
	    alt hit
	        Cache-->>App: cached upstream response
	    else miss
	        App->>Upstream: call external API
	        Upstream-->>App: response
	        App->>Cache: store bounded reuse
	    end

Why It Matters

This pattern matters because upstream dependencies often dominate both cost and risk. Even a modest hit rate can:

cut paid request volume
reduce rate-limit breaches
shield the application from transient upstream slowdowns
improve tail latency materially

But caching the wrong upstream result can also freeze transient mistakes or violate freshness promises that the business assumed were live.

Where It Fits Best

This pattern is strongest when:

upstream responses are reused by many callers
the upstream cost or latency is high
the allowed freshness window is explicit
the response contract is stable enough to key safely

It is much weaker when the upstream answer is highly per-user, highly volatile, or contractually expected to be current on every call.

Example

This cache policy for a shipping-quote service shows the kind of boundaries that should be explicit before reuse begins.

1upstream_cache:
2  target: shipping_quote
3  key_dimensions:
4    - origin_postal_code
5    - destination_postal_code
6    - package_weight_grams
7    - service_level
8  ttl_seconds: 30
9  stale_if_error_seconds: 120

What to notice:

the cache key is based on the real request-defining inputs
the TTL is short because the quote may change quickly
stale_if_error turns the cache into a resilience layer during temporary upstream failure

Common Mistakes

caching upstream errors as if they were stable facts
ignoring auth or quota-related request dimensions in the cache key
using TTLs that violate the upstream freshness expectation
treating third-party data as authoritative long after the supplier would consider it stale

Design Review Question

Why can a short-lived cache in front of a third-party API be valuable even when the hit rate is not extremely high?

The stronger answer is that each avoided call may save meaningful latency, paid usage, and rate-limit pressure. External calls often have such high cost per miss that moderate reuse still pays off.

Quiz Time

Loading quiz…

Revised on Wednesday, June 3, 2026

8.2 Precomputation and Materialization

8.4 Hybrid Compute Caches