Pipeline Pattern for Data Processing in Clojure

Learn how to design channel-driven data pipelines in Clojure with clear stage ownership, bounded buffers, and the right choice among pipeline variants.

Pipeline pattern: A design where data moves through explicit stages, each stage doing one job before handing the result to the next stage.

Pipelines are strongest when each stage has:

  • one responsibility
  • one clear input contract
  • one clear output contract

In Clojure, channels make those boundaries visible. That is why core.async pipelines are useful: they turn “the flow of work” into data structures and queues you can reason about.

Stage Ownership Matters

Good stages are narrow:

  • parse
  • validate
  • enrich
  • persist
  • publish

Bad stages do three of those things at once. When a stage both transforms data and performs multiple side effects, it becomes hard to scale, retry, or test independently.

Choose the Right Pipeline Variant

The core.async variants matter because they encode workload type:

  • pipeline for non-blocking transformation
  • pipeline-blocking for blocking work such as network or filesystem calls
  • pipeline-async when another async system or callback produces results later

That distinction is not a technical footnote. It is how the system avoids starving the wrong execution resources.

A Simple Computational Pipeline

 1(require '[clojure.core.async :as async])
 2
 3(def in  (async/chan 64))
 4(def out (async/chan 64))
 5
 6(async/pipeline
 7  8
 8  out
 9  (map (fn [x] (* x 2)))
10  in)

This is appropriate when the transformation is:

  • independent per item
  • non-blocking
  • cheap enough to parallelize sensibly

If the stage blocks, use pipeline-blocking instead.

Backpressure Is Part of the Design

Every pipeline needs an answer to overload:

  • bounded buffers
  • slow producers
  • drop policy
  • retry path
  • dead-letter path

Channel capacities and worker counts are not just tuning parameters. They are the system’s visible backpressure policy.

Error Handling Belongs to Stages, Not to Hope

Pipelines usually need explicit answers to:

  • what happens when one item fails validation
  • what happens when a downstream side effect fails
  • whether bad items are retried, dropped, or routed elsewhere

A strong pipeline often includes a dedicated error or dead-letter path rather than pretending every stage is total and failure-free.

Pipelines Are Not Just Faster Maps

The main value of a pipeline is not automatic speed. It is stage isolation:

  • stage-specific concurrency
  • stage-specific retry logic
  • stage-specific metrics
  • easier reasoning about where work stalls

That is much better than one large function that parses, validates, writes, logs, and retries in one opaque control flow.

Stage Boundaries Should Match Real Responsibilities

If one stage exists only because the tooling made it easy to add another channel, the design is drifting. Stage boundaries should map to meaningful work ownership, not arbitrary technical separation.

Stage Concurrency Should Differ by Work Type

One pipeline-wide worker count is rarely the best answer. Parse and validation stages may need little parallelism, enrichment may be mostly CPU-bound, and persistence may be limited by downstream database capacity. Treating every stage as if it wants the same width usually creates one hidden bottleneck and one overprovisioned stage.

That is why good pipeline review asks:

  • which stage is CPU-bound?
  • which stage blocks on remote systems?
  • where should ordering be preserved?
  • where should throughput be capped to protect a downstream dependency

Practical Rule

Use a pipeline when work naturally flows through distinct stages. Choose the variant that matches blocking behavior, keep stages narrow, and make backpressure policy explicit. A pipeline should clarify the movement of work, not hide it behind plumbing.

Revised on Thursday, April 23, 2026