Fan-Out/Fan-In Workflows

March 23, 2026

Explain parallel processing patterns where many tasks are launched and later aggregated. Discuss timeout handling, partial results, and cost implications.

Fan-out/fan-in workflows parallelize work by splitting one job into many independent tasks, then collecting their results into one aggregated outcome. In serverless systems, this is a natural fit for image processing, record enrichment, document analysis, and bulk notification pipelines because short-lived functions can process slices of work concurrently without requiring a large permanently running worker fleet.

The design challenge is not parallelism itself. It is deciding how much parallelism the dependencies can tolerate, what counts as “done,” and how partial failure should be handled. Fan-out/fan-in is a performance pattern, but it is also a coordination pattern, and the aggregation rules matter as much as the worker code.

    flowchart LR
	    A["Start job"] --> B["Split into N tasks"]
	    B --> C["Worker 1"]
	    B --> D["Worker 2"]
	    B --> E["Worker N"]
	    C --> F["Result store"]
	    D --> F
	    E --> F
	    F --> G["Aggregator"]
	    G --> H["Final status"]

What to notice:

the workflow creates many bounded tasks rather than one oversized job
results usually land in a durable store before aggregation
aggregation rules decide whether the workflow succeeds, fails, or completes partially

When Fan-Out/Fan-In Is a Strong Fit

This pattern is strongest when:

work items are independent
per-item processing is roughly similar in shape
total latency matters more than strict sequential order
the final result can be computed from a set of child outcomes

It is weaker when every step depends tightly on the previous one or when downstream systems cannot absorb bursty parallelism.

Aggregation Is the Real Contract

Teams often spend too much time on the fan-out side and too little time on the fan-in side. The important questions are:

must all children succeed?
can the workflow return partial success?
how long should the workflow wait for slow tasks?
what happens to missing or timed-out results?

These are product and operational questions, not just programming questions.

1workflow:
2  name: batch-image-analysis
3  fan_out:
4    chunk_size: 25
5    max_parallel_tasks: 20
6  fan_in:
7    required_completion_ratio: 0.95
8    timeout_seconds: 300
9    on_timeout: mark-partial

 1type ChildResult = {
 2  itemId: string;
 3  status: "complete" | "failed" | "timed_out";
 4  output?: string;
 5};
 6
 7export function summarizeBatch(results: ChildResult[]) {
 8  const complete = results.filter((r) => r.status === "complete").length;
 9  const failed = results.filter((r) => r.status !== "complete").length;
10
11  return {
12    complete,
13    failed,
14    overallStatus: failed === 0 ? "complete" : "partial",
15  };
16}

What this demonstrates:

fan-in is a business rule, not just “wait for everything”
timeouts and partial results must be modeled explicitly
the aggregate outcome should be computed from durable child status

Cost and Dependency Pressure

Serverless makes fan-out easy enough that teams sometimes forget cost and downstream capacity. Launching 10,000 parallel function invocations may reduce completion time, but it can also:

saturate a database
exhaust a partner API quota
create a large spike in queue or network usage
produce an expensive aggregation burden

That is why bounded parallelism usually beats unlimited parallelism. The strongest designs separate “the total number of items” from “the maximum concurrency the system may use right now.”

Common Mistakes

assuming all parallel tasks must succeed when the product could tolerate partial results
letting workers write results in a shape the aggregator cannot reconcile cleanly
using unlimited parallelism without checking downstream capacity
forgetting timeout, cancellation, or straggler behavior

Design Review Question

A report-generation workflow splits one customer request into 500 analysis tasks. The team assumes the job is complete only when all 500 finish successfully. In practice, a few slow tasks frequently block the entire result for too long. What should be revisited first?

The stronger answer is the completion contract. The team should decide whether partial completion, quorum, or separate handling for stragglers would satisfy the product better than a rigid all-or-nothing fan-in rule. The problem may not be worker performance. It may be an aggregation rule that is harsher than the business actually needs.

Check Your Understanding

Loading quiz…

Revised on Wednesday, June 3, 2026

9.1 Step Functions and Workflow Engines

9.3 Long-Running Business Processes