Capacity Planning and Eviction

March 26, 2026

Sizing caches, reading pressure signals, and matching eviction policy to workload shape instead of defaults.

Capacity planning and eviction tuning determine whether a cache remains useful under real traffic. A cache that is too small churns constantly. A cache that is large but filled with the wrong data may still deliver poor results. Eviction policy is not only an algorithm choice such as LRU or LFU. It is a statement about what kind of reuse the system expects.

Teams often start with defaults and only tune when the cache becomes obviously unstable. That is late. Good cache tuning begins by understanding workload shape: temporal locality, hot-key persistence, object size distribution, and whether the access pattern is bursty, cyclical, or steadily skewed.

    flowchart LR
	    A["Workload shape"] --> B["Capacity target"]
	    A --> C["Eviction policy choice"]
	    B --> D["Observed eviction pressure"]
	    C --> D
	    D --> E["Origin fallback behavior"]
	    D --> F["Hit quality on hot data"]

Why It Matters

Eviction is where the cache makes trade-offs under pressure. When the cache is full, it decides which values deserve to remain trusted for reuse and which must be forgotten. If the policy does not match the workload, the system can evict precisely the items that were most useful.

Important planning questions include:

Are accesses driven by a stable hot set or by recency bursts?
Are entries similarly sized or highly uneven?
Is the cache protecting expensive recomputation or only cheap reads?
How costly is one additional miss to the origin?

Choosing a Policy

There is no universal winner.

LRU tends to fit workloads dominated by recent reuse.
LFU tends to fit workloads with a stable long-lived hot set.
TTL-heavy strategies fit cases where freshness boundaries dominate reuse.
Size-aware or cost-aware strategies matter when some entries are much larger or more expensive than others.

1eviction_tuning:
2  workload: stable_hot_set
3  preferred_policy: lfu
4  memory_budget_mb: 2048
5  alert_when:
6    eviction_rate_per_minute: "> 5000"
7    origin_fallback_qps: "> baseline * 1.5"

What To Watch

Useful tuning signals include:

eviction rate and burstiness
hit rate on hot keys, not only global hit rate
memory fragmentation or overhead where relevant
refill pressure on the origin after eviction churn
item size distribution and whether a few large values crowd out many useful small ones

The goal is not “few evictions at any cost.” Some eviction is normal. The goal is to avoid destructive churn where the cache repeatedly discards data it will immediately need again.

Example

This example shows how operators often frame a capacity review.

 1capacity_review:
 2  current_memory_gb: 8
 3  p95_memory_used_gb: 7.6
 4  eviction_bursts_after_deploy: true
 5  hot_key_hit_rate: 0.82
 6  cold_key_hit_rate: 0.11
 7  recommendation:
 8    - increase_capacity_gb: 2
 9    - keep_ttl_for_cold_sets_short: true
10    - evaluate_lfu_for_catalog_family: true

What to notice:

hot and cold key behavior should be separated
capacity planning is linked to operational events like deploys and warmups
the right fix may be a mix of more memory, a different policy, and tighter scope on low-value entries

Common Mistakes

treating eviction policy as a one-time checkbox
optimizing for global hit rate while hot keys churn
ignoring object size and letting large entries crowd out the useful working set
scaling capacity without checking whether invalidation or scope design is the real issue

Design Review Question

How do you know whether poor cache performance is caused by insufficient memory, the wrong eviction policy, or the wrong cache scope?

The stronger answer is that the team should compare eviction churn, hot-key retention, object size distribution, and origin fallback patterns. If larger memory does not improve hot-key retention, the issue may be scope or invalidation design rather than capacity alone.

Quiz Time

Loading quiz…

Revised on Wednesday, June 3, 2026

14.1 What to Measure in a Cache

14.3 Diagnosing Cache Incidents