Distributed Systems and Service Discovery

March 28, 2026

How Clojure services participate in modern distributed systems with platform DNS, registries such as Consul, and selective coordination services.

Distributed systems and service discovery are easiest to reason about when you separate three different concerns:

how a service finds another service
how the platform reports health and availability
how the system coordinates shared decisions such as leader election or distributed locks

Older material often treats ZooKeeper as the default answer to all three. That is no longer a good default. In modern systems, service discovery is often provided by the runtime itself, especially in Kubernetes, or by a registry/catalog platform such as Consul. Coordination systems still matter, but they solve a narrower class of problems than “how do services find each other?”

Treat Discovery as a Platform Concern First

In most current deployments, the application should not own the full discovery mechanism. It should consume discovery through a stable platform contract.

Common patterns include:

DNS-based discovery in Kubernetes or other platform-managed environments
registry/catalog discovery through systems such as Consul
server-side discovery through gateways, proxies, or load balancers
configuration-based routing for a small number of stable dependencies

The best Clojure code often knows very little about the discovery internals. It usually needs:

a hostname or base URL
timeout and retry policy
health-aware request handling
observability around failed calls

That keeps discovery logic from leaking into every namespace.

Modern Discovery Options

Platform DNS

Kubernetes Services create stable service identities, and cluster DNS resolves those names for clients. For many systems, that is enough. The application only needs to call a service name that the runtime resolves.

 1(ns acme.checkout.pricing
 2  (:import [java.net URI]
 3           [java.net.http HttpClient HttpRequest HttpResponse]
 4           [java.time Duration]))
 5
 6(def ^HttpClient client
 7  (-> (HttpClient/newBuilder)
 8      (.connectTimeout (Duration/ofSeconds 2))
 9      (.build)))
10
11(defn pricing-base-url []
12  (or (System/getenv "PRICING_URL")
13      "http://pricing.default.svc.cluster.local:8080"))
14
15(defn fetch-price [sku]
16  (let [request (-> (HttpRequest/newBuilder
17                     (URI/create (str (pricing-base-url) "/prices/" sku)))
18                    (.timeout (Duration/ofSeconds 3))
19                    (.GET)
20                    (.build))
21        response (.send client request (HttpResponse$BodyHandlers/ofString))]
22    {:status (.statusCode response)
23     :body (.body response)}))

This works well when the platform already handles:

service naming
instance membership
basic load balancing
endpoint churn

Registry or Catalog Discovery

In VM-heavy, hybrid, or multi-runtime systems, a registry such as Consul can act as a central catalog of services and their locations. In that model, clients or sidecars resolve healthy instances through the registry’s DNS or HTTP APIs.

The application still should not scatter registry calls everywhere. Wrap them in one namespace or adapter:

 1(ns acme.discovery.consul
 2  (:require [clojure.data.json :as json])
 3  (:import [java.net URI]
 4           [java.net.http HttpClient HttpRequest HttpResponse]
 5           [java.time Duration]))
 6
 7(def ^HttpClient client
 8  (-> (HttpClient/newBuilder)
 9      (.connectTimeout (Duration/ofSeconds 2))
10      (.build)))
11
12(defn service-url [consul-base service-name]
13  (let [request (-> (HttpRequest/newBuilder
14                     (URI/create (str consul-base "/v1/catalog/service/" service-name)))
15                    (.timeout (Duration/ofSeconds 2))
16                    (.GET)
17                    (.build))
18        response (.send client request (HttpResponse$BodyHandlers/ofString))
19        service (first (json/read-str (.body response) :key-fn keyword))]
20    (when service
21      (str "http://" (:ServiceAddress service) ":" (:ServicePort service)))))

The important design move is not the specific registry call. It is the decision to keep registry dependence localized and replaceable.

Coordination Is Not the Same as Discovery

Service discovery answers “where is the service?” Coordination answers harder shared-state questions such as:

who is the current leader?
who owns this shard?
who holds the lock?
which configuration version is current?

Tools such as ZooKeeper, etcd, or Consul can help with coordination. ZooKeeper remains historically important and still exists in inherited platforms, especially around older messaging or coordination-heavy systems. But it should not be taught as the default answer for new service discovery in platform-managed applications.

That distinction matters because discovery and coordination fail differently:

discovery problems usually degrade routing and dependency reachability
coordination problems can corrupt ownership, ordering, or shared-state assumptions

Design Clojure Services for Failure, Not for Perfect Topology

Whatever discovery mechanism you choose, your Clojure code still has to survive:

stale endpoints
slow upstreams
partial partitions
split-brain assumptions in poorly coordinated systems
mismatches between registry state and real service health

That means the service client needs:

connection and request timeouts
retries only where operations are safe to retry
circuit breaking or load shedding
logging and metrics tied to dependency identity
fallback behavior when discovery succeeds but the dependency is still unhealthy

Discovery tells you where to call. It does not guarantee the dependency is ready to help.

A Better Architecture Model

    flowchart LR
	    A["Clojure Service"] --> B["Platform Discovery Layer"]
	    B --> C["Kubernetes Service DNS"]
	    B --> D["Consul Catalog or DNS"]
	    B --> E["Gateway / Load Balancer"]
	    A --> F["Retry, Timeout, and Circuit Policies"]
	    A --> G["Logs, Metrics, and Traces"]

The important thing to notice is that discovery is only one layer. Operational resilience sits beside it, not beneath it.

Consistency Still Matters

Distributed systems discussions often slide straight from discovery into consensus algorithms. That is too big a jump. The practical question for most application teams is simpler:

Does this workflow require strong coordination?
Can it tolerate lag or eventual convergence?
What happens when service membership changes mid-request?

If the workflow only needs to find a healthy stateless dependency, DNS or a service registry is often enough. If the workflow depends on exclusive ownership or ordered mutation, you may need a real coordination story.

Key Takeaways

Treat service discovery as a platform capability first, not as ad hoc application logic.
Prefer platform DNS, registries, or gateways for ordinary dependency lookup in modern deployments.
Keep registry-specific or discovery-specific code behind one adapter namespace.
Distinguish coordination problems from discovery problems; they are not the same design problem.
Build retries, timeouts, and observability alongside discovery because a discovered service can still be unhealthy.

References and Further Reading

Ready to Test Your Knowledge?

Loading quiz…

Revised on Wednesday, June 3, 2026

15.11 Handling Network Errors and Retries

15.13 Java Interop Patterns