Distributed Systems and Service Discovery

How Clojure services participate in modern distributed systems with platform DNS, registries such as Consul, and selective coordination services.

Distributed systems and service discovery are easiest to reason about when you separate three different concerns:

  • how a service finds another service
  • how the platform reports health and availability
  • how the system coordinates shared decisions such as leader election or distributed locks

Older material often treats ZooKeeper as the default answer to all three. That is no longer a good default. In modern systems, service discovery is often provided by the runtime itself, especially in Kubernetes, or by a registry/catalog platform such as Consul. Coordination systems still matter, but they solve a narrower class of problems than “how do services find each other?”

Treat Discovery as a Platform Concern First

In most current deployments, the application should not own the full discovery mechanism. It should consume discovery through a stable platform contract.

Common patterns include:

  • DNS-based discovery in Kubernetes or other platform-managed environments
  • registry/catalog discovery through systems such as Consul
  • server-side discovery through gateways, proxies, or load balancers
  • configuration-based routing for a small number of stable dependencies

The best Clojure code often knows very little about the discovery internals. It usually needs:

  • a hostname or base URL
  • timeout and retry policy
  • health-aware request handling
  • observability around failed calls

That keeps discovery logic from leaking into every namespace.

Modern Discovery Options

Platform DNS

Kubernetes Services create stable service identities, and cluster DNS resolves those names for clients. For many systems, that is enough. The application only needs to call a service name that the runtime resolves.

 1(ns acme.checkout.pricing
 2  (:import [java.net URI]
 3           [java.net.http HttpClient HttpRequest HttpResponse]
 4           [java.time Duration]))
 5
 6(def ^HttpClient client
 7  (-> (HttpClient/newBuilder)
 8      (.connectTimeout (Duration/ofSeconds 2))
 9      (.build)))
10
11(defn pricing-base-url []
12  (or (System/getenv "PRICING_URL")
13      "http://pricing.default.svc.cluster.local:8080"))
14
15(defn fetch-price [sku]
16  (let [request (-> (HttpRequest/newBuilder
17                     (URI/create (str (pricing-base-url) "/prices/" sku)))
18                    (.timeout (Duration/ofSeconds 3))
19                    (.GET)
20                    (.build))
21        response (.send client request (HttpResponse$BodyHandlers/ofString))]
22    {:status (.statusCode response)
23     :body (.body response)}))

This works well when the platform already handles:

  • service naming
  • instance membership
  • basic load balancing
  • endpoint churn

Registry or Catalog Discovery

In VM-heavy, hybrid, or multi-runtime systems, a registry such as Consul can act as a central catalog of services and their locations. In that model, clients or sidecars resolve healthy instances through the registry’s DNS or HTTP APIs.

The application still should not scatter registry calls everywhere. Wrap them in one namespace or adapter:

 1(ns acme.discovery.consul
 2  (:require [clojure.data.json :as json])
 3  (:import [java.net URI]
 4           [java.net.http HttpClient HttpRequest HttpResponse]
 5           [java.time Duration]))
 6
 7(def ^HttpClient client
 8  (-> (HttpClient/newBuilder)
 9      (.connectTimeout (Duration/ofSeconds 2))
10      (.build)))
11
12(defn service-url [consul-base service-name]
13  (let [request (-> (HttpRequest/newBuilder
14                     (URI/create (str consul-base "/v1/catalog/service/" service-name)))
15                    (.timeout (Duration/ofSeconds 2))
16                    (.GET)
17                    (.build))
18        response (.send client request (HttpResponse$BodyHandlers/ofString))
19        service (first (json/read-str (.body response) :key-fn keyword))]
20    (when service
21      (str "http://" (:ServiceAddress service) ":" (:ServicePort service)))))

The important design move is not the specific registry call. It is the decision to keep registry dependence localized and replaceable.

Coordination Is Not the Same as Discovery

Service discovery answers “where is the service?” Coordination answers harder shared-state questions such as:

  • who is the current leader?
  • who owns this shard?
  • who holds the lock?
  • which configuration version is current?

Tools such as ZooKeeper, etcd, or Consul can help with coordination. ZooKeeper remains historically important and still exists in inherited platforms, especially around older messaging or coordination-heavy systems. But it should not be taught as the default answer for new service discovery in platform-managed applications.

That distinction matters because discovery and coordination fail differently:

  • discovery problems usually degrade routing and dependency reachability
  • coordination problems can corrupt ownership, ordering, or shared-state assumptions

Design Clojure Services for Failure, Not for Perfect Topology

Whatever discovery mechanism you choose, your Clojure code still has to survive:

  • stale endpoints
  • slow upstreams
  • partial partitions
  • split-brain assumptions in poorly coordinated systems
  • mismatches between registry state and real service health

That means the service client needs:

  • connection and request timeouts
  • retries only where operations are safe to retry
  • circuit breaking or load shedding
  • logging and metrics tied to dependency identity
  • fallback behavior when discovery succeeds but the dependency is still unhealthy

Discovery tells you where to call. It does not guarantee the dependency is ready to help.

A Better Architecture Model

    flowchart LR
	    A["Clojure Service"] --> B["Platform Discovery Layer"]
	    B --> C["Kubernetes Service DNS"]
	    B --> D["Consul Catalog or DNS"]
	    B --> E["Gateway / Load Balancer"]
	    A --> F["Retry, Timeout, and Circuit Policies"]
	    A --> G["Logs, Metrics, and Traces"]

The important thing to notice is that discovery is only one layer. Operational resilience sits beside it, not beneath it.

Consistency Still Matters

Distributed systems discussions often slide straight from discovery into consensus algorithms. That is too big a jump. The practical question for most application teams is simpler:

  • Does this workflow require strong coordination?
  • Can it tolerate lag or eventual convergence?
  • What happens when service membership changes mid-request?

If the workflow only needs to find a healthy stateless dependency, DNS or a service registry is often enough. If the workflow depends on exclusive ownership or ordered mutation, you may need a real coordination story.

Key Takeaways

  • Treat service discovery as a platform capability first, not as ad hoc application logic.
  • Prefer platform DNS, registries, or gateways for ordinary dependency lookup in modern deployments.
  • Keep registry-specific or discovery-specific code behind one adapter namespace.
  • Distinguish coordination problems from discovery problems; they are not the same design problem.
  • Build retries, timeouts, and observability alongside discovery because a discovered service can still be unhealthy.

References and Further Reading

Ready to Test Your Knowledge?

Loading quiz…
Revised on Thursday, April 23, 2026