Learn when circuit breakers actually help in Clojure microservices, how they interact with timeouts and retries, and how to use Resilience4j through JVM interop without hiding the real failure model.
Circuit breaker: A protection mechanism that stops sending calls to a failing dependency once failure patterns cross a defined threshold.
Circuit breakers are useful, but they are often taught too optimistically. They do not “fix resilience” by themselves. They help the caller fail fast when a dependency is already unhealthy. That reduces wasted work, protects thread pools, and creates space for fallback or recovery logic. They do not remove the need for good timeout design, idempotency, or operational visibility.
In a distributed system, a dependency can fail by becoming slow, flaky, or unavailable. If callers keep sending work blindly, the failure spreads outward through queue growth, thread exhaustion, and retries.
Circuit breakers help by:
The real benefit is not elegance. It is limiting blast radius.
A circuit breaker without sensible timeouts still waits too long. Retries without a breaker can amplify a dependency outage. The stronger pattern is to combine:
Those pieces need to agree with each other. Otherwise the system becomes harder to reason about, not easier.
In Clojure, Resilience4j is often the straightforward option because it is a mature JVM library and Java interop is usually the simplest path.
1(ns myapp.resilience
2 (:import [io.github.resilience4j.circuitbreaker
3 CircuitBreaker CircuitBreakerConfig CircuitBreakerRegistry]
4 [java.time Duration]))
5
6(defn make-breaker []
7 (let [config (-> (CircuitBreakerConfig/custom)
8 (.failureRateThreshold 50.0)
9 (.waitDurationInOpenState (Duration/ofSeconds 5))
10 (.slidingWindowSize 20)
11 (.build))
12 registry (CircuitBreakerRegistry/of config)]
13 (.circuitBreaker registry "inventory-service")))
The more important decision is not the constructor. It is what failures count, what the timeout budget is, and what “recovered” should mean for a half-open probe.
Fallbacks sound attractive, but many services do not actually have a safe fallback. Returning stale or partial data may be better than total failure in some cases. In others, it silently corrupts user expectations.
Ask:
If the answer is unclear, failing fast with a precise error can be stronger than pretending the system is fine.
If calls are still allowed to hang too long, the breaker engages too late to help much.
Unbounded or poorly scoped retries can turn one failing dependency into a platform-wide incident.
Returning a default value is only safe when the business meaning of that default is honest and acceptable.
Use circuit breakers on remote calls whose failure can cascade. Keep timeout budgets short and explicit. Retry only idempotent operations and only where a retry is truly likely to help. Treat fallback as a product decision, not just a technical trick.