Pipeline Pattern for Data Processing in Clojure

March 31, 2026

Learn how to design channel-driven data pipelines in Clojure with clear stage ownership, bounded buffers, and the right choice among pipeline variants.

9.18. Pipeline Pattern for Data Processing

In the realm of data processing, the pipeline pattern is a powerful design approach that allows for the efficient handling of data through a series of processing stages. In Clojure, the core.async library provides robust support for implementing pipelines, enabling developers to process data concurrently and manage backpressure effectively. This section delves into the pipeline pattern, its advantages, and how to implement it using Clojure’s core.async.

Understanding the Pipeline Pattern

The pipeline pattern is a design pattern that processes data in stages, where each stage performs a specific transformation or operation on the data. This pattern is particularly useful in scenarios where data needs to be processed in a sequential yet concurrent manner, such as in ETL (Extract, Transform, Load) processes or real-time stream processing.

Advantages of the Pipeline Pattern

Modularity: Each stage of the pipeline can be developed and tested independently, promoting clean and maintainable code.
Concurrency: By leveraging Clojure’s core.async, stages can run concurrently, improving throughput and performance.
Scalability: Pipelines can be easily scaled by adjusting the concurrency level of each stage.
Backpressure Management: The pattern naturally handles backpressure, ensuring that slower stages do not overwhelm faster ones.

Implementing Pipelines with `core.async`

Clojure’s core.async library provides a set of functions to create and manage pipelines. The pipeline and pipeline-blocking functions are central to this implementation, allowing for asynchronous and blocking operations, respectively.

Basic Pipeline Example

Let’s start with a simple example to illustrate the basic structure of a pipeline using core.async.

 1(ns pipeline-example
 2  (:require [clojure.core.async :refer [chan pipeline <!! >!! close!]]))
 3
 4(defn process-stage [input]
 5  (* input 2))
 6
 7(defn create-pipeline [input-channel output-channel]
 8  (pipeline 4 output-channel (map process-stage) input-channel))
 9
10(defn -main []
11  (let [input-channel (chan 10)
12        output-channel (chan 10)]
13    (create-pipeline input-channel output-channel)
14    (>!! input-channel 1)
15    (>!! input-channel 2)
16    (>!! input-channel 3)
17    (close! input-channel)
18    (println (<!! output-channel))
19    (println (<!! output-channel))
20    (println (<!! output-channel))))

In this example, we define a simple processing stage process-stage that doubles the input value. The create-pipeline function sets up a pipeline with a concurrency level of 4, meaning up to four items can be processed concurrently. The -main function demonstrates sending data through the pipeline and receiving the processed output.

Managing Stages and Concurrency

When designing a pipeline, it’s crucial to manage the stages and their concurrency levels effectively. Each stage can have its own concurrency level, which determines how many items it can process simultaneously.

Example: Multi-Stage Pipeline

Let’s extend our previous example to include multiple stages.

 1(defn transform-stage [input]
 2  (str "Transformed: " input))
 3
 4(defn create-multi-stage-pipeline [input-channel output-channel]
 5  (let [intermediate-channel (chan 10)]
 6    (pipeline 4 intermediate-channel (map process-stage) input-channel)
 7    (pipeline 2 output-channel (map transform-stage) intermediate-channel)))
 8
 9(defn -main []
10  (let [input-channel (chan 10)
11        output-channel (chan 10)]
12    (create-multi-stage-pipeline input-channel output-channel)
13    (>!! input-channel 1)
14    (>!! input-channel 2)
15    (>!! input-channel 3)
16    (close! input-channel)
17    (println (<!! output-channel))
18    (println (<!! output-channel))
19    (println (<!! output-channel))))

In this multi-stage pipeline, we introduce a transform-stage that converts the processed data into a string. The pipeline now consists of two stages, each with its own concurrency level. The first stage processes data with a concurrency of 4, while the second stage has a concurrency of 2.

Handling Backpressure

Backpressure is a critical aspect of pipeline design, ensuring that slower stages do not become bottlenecks. In core.async, channels naturally handle backpressure by blocking when full, preventing data from being lost or overwhelming the system.

Example: Backpressure Management

Consider a scenario where the second stage is significantly slower than the first. We can adjust the buffer sizes and concurrency levels to manage backpressure effectively.

 1(defn slow-transform-stage [input]
 2  (Thread/sleep 1000) ; Simulate a slow operation
 3  (str "Slowly Transformed: " input))
 4
 5(defn create-backpressure-pipeline [input-channel output-channel]
 6  (let [intermediate-channel (chan 5)] ; Smaller buffer to manage backpressure
 7    (pipeline 4 intermediate-channel (map process-stage) input-channel)
 8    (pipeline 1 output-channel (map slow-transform-stage) intermediate-channel)))
 9
10(defn -main []
11  (let [input-channel (chan 10)
12        output-channel (chan 10)]
13    (create-backpressure-pipeline input-channel output-channel)
14    (dotimes [i 10]
15      (>!! input-channel i))
16    (close! input-channel)
17    (dotimes [_ 10]
18      (println (<!! output-channel)))))

In this example, the slow-transform-stage simulates a slow operation by introducing a delay. The intermediate channel has a smaller buffer size to manage backpressure, ensuring that the pipeline does not become overwhelmed.

Scenarios for Using the Pipeline Pattern

The pipeline pattern is particularly useful in scenarios such as ETL processes and stream processing, where data needs to be processed in a sequential yet concurrent manner.

ETL Processes

In ETL processes, data is extracted from a source, transformed, and then loaded into a destination. The pipeline pattern allows each of these stages to be handled independently and concurrently, improving efficiency and scalability.

Stream Processing

For real-time stream processing, the pipeline pattern enables data to be processed as it arrives, with each stage handling a specific transformation or operation. This approach is ideal for applications that require low-latency processing, such as financial trading systems or real-time analytics.

Best Practices for Designing Efficient Pipelines

Define Clear Stage Boundaries: Ensure each stage has a well-defined responsibility and interface.
Optimize Concurrency Levels: Adjust concurrency levels based on the complexity and resource requirements of each stage.
Monitor and Adjust Buffer Sizes: Use appropriate buffer sizes to manage backpressure and prevent data loss.
Test for Performance: Regularly test the pipeline under load to identify bottlenecks and optimize performance.
Use Logging and Monitoring: Implement logging and monitoring to track the flow of data and identify issues.

Clojure Unique Features

Clojure’s immutable data structures and functional programming paradigm make it particularly well-suited for implementing the pipeline pattern. The use of core.async provides a powerful abstraction for managing concurrency and backpressure, allowing developers to focus on the logic of each stage without worrying about low-level threading details.

Differences and Similarities with Other Patterns

The pipeline pattern is often compared to the chain of responsibility pattern. While both involve processing data through a series of handlers, the pipeline pattern emphasizes concurrent processing and backpressure management, making it more suitable for high-throughput data processing scenarios.

Try It Yourself

To gain a deeper understanding of the pipeline pattern, try modifying the code examples provided. Experiment with different concurrency levels, buffer sizes, and processing stages to see how they affect the performance and behavior of the pipeline.

Visualizing the Pipeline Pattern

Below is a diagram illustrating the flow of data through a multi-stage pipeline, highlighting the concurrency and backpressure management aspects.

    graph TD;
	    A[Input Channel] -->|Stage 1| B[Intermediate Channel];
	    B -->|Stage 2| C[Output Channel];
	    subgraph Stage 1
	        B1[Process Stage]
	    end
	    subgraph Stage 2
	        C1[Transform Stage]
	    end

This diagram shows how data flows from the input channel through two stages, each with its own concurrency level, before reaching the output channel. The intermediate channel acts as a buffer, managing backpressure between the stages.

Ready to Test Your Knowledge?

Loading quiz…

Remember, this is just the beginning. As you progress, you’ll build more complex and interactive pipelines. Keep experimenting, stay curious, and enjoy the journey!

Revised on Wednesday, June 3, 2026

9.17 Reactor Pattern with core.async

9.19 Handling Backpressure in Async Systems