Sagas, Orchestration, and Choreography

A practical lesson on saga-based workflow coordination, the trade-offs between orchestration and choreography, and how to choose a coordination style that matches workflow complexity.

A saga is a way to coordinate a long-running business workflow across several services without pretending the system can do one global rollback. Instead of one giant transaction, the workflow becomes a sequence of local transactions with explicit follow-up steps and explicit recovery behavior. This makes sagas one of the main design tools for service boundaries. They allow a workflow to cross several owners while still preserving understandable state transitions.

The important point is that a saga is not magic. It is a coordination style. If the workflow is badly decomposed, a saga will not fix the underlying boundary problem. It will only make the coordination logic more visible.

    flowchart LR
	    O["Order created"] --> P["Authorize payment"]
	    P --> I["Reserve inventory"]
	    I --> C["Confirm order"]
	    I --> F["Compensate payment"]
	    P --> R["Mark order failed"]

The diagram matters because it shows the workflow as a series of durable steps instead of one invisible cross-service transaction.

What a Saga Is Actually Doing

In a saga, each service:

  • performs its own local transaction
  • publishes an event or replies with a result
  • allows the workflow to continue, fail, or compensate

This means the workflow has to track where it is. That state may live in:

  • a dedicated workflow coordinator
  • event streams interpreted by several services
  • a durable state record linked to the business process

The details vary, but the core idea stays the same: workflow state becomes explicit.

Orchestration: One Place Coordinates the Steps

In orchestration, one component decides what happens next. It sends commands such as:

  • authorize payment
  • reserve inventory
  • mark order confirmed
  • trigger refund compensation

This has several benefits:

  • the workflow is easier to understand in one place
  • operational teams can inspect the current step directly
  • timeouts and retries can be managed centrally
  • compensation paths are usually easier to reason about

The trade-off is that the orchestrator becomes a first-class component with real responsibility. If teams are careless, it can turn into a logic-heavy control tower that knows too much about every domain.

1{
2  "sagaId": "saga_901",
3  "workflow": "place_order",
4  "currentStep": "reserve_inventory",
5  "status": "WAITING",
6  "nextOnSuccess": "confirm_order",
7  "nextOnFailure": "refund_payment"
8}

What this demonstrates:

  • the workflow state is durable and inspectable
  • the next action is explicit instead of implied by scattered callbacks
  • failure handling is part of the model, not an afterthought

Choreography: Services React to Events

In choreography, no single coordinator directs the whole workflow. Services react to business events:

  • order service publishes OrderPlaced
  • payment service reacts and publishes PaymentAuthorized
  • inventory service reacts and publishes InventoryReserved
  • order service reacts and marks the order confirmed

This can reduce central control logic and allow services to stay loosely coupled at the workflow level. It works well when:

  • the workflow is relatively simple
  • the event meanings are stable
  • each consumer’s reaction is clear
  • the organization can observe event flow well

The main risk is that the workflow logic becomes hard to see. What looks decoupled in code can become hard to debug in production if the path from cause to effect is scattered across many listeners.

Choosing Between Them

The question is not which pattern is more fashionable. The question is which pattern makes the workflow easier to understand, test, and recover.

Orchestration is often stronger when:

  • the process has many branches
  • timeouts and manual review matter
  • compensation logic is significant
  • auditability is important

Choreography is often stronger when:

  • the process is mostly linear
  • several independent consumers react to the same business fact
  • no single team should own all downstream decisions
  • events already represent meaningful domain facts

This is why the workflow model and the decomposition model influence each other. If the business flow is too complicated for event-only reasoning, forcing choreography can hide coupling rather than reduce it.

Beware of Event-Only Wishful Thinking

Teams sometimes say, “We will just publish events and everything else will subscribe.” That sounds loose and scalable, but it often avoids the harder question: who actually owns the workflow outcome?

Warning signs include:

  • no durable record of current workflow state
  • no clear timeout owner
  • no clear compensation trigger
  • several services assuming another service will finish the process

That is not event-driven elegance. It is coordination ambiguity.

A Simple Comparison

1workflow: place-order
2orchestration:
3  strength: explicit control flow
4  main_risk: coordinator can become too central
5choreography:
6  strength: local autonomy around domain events
7  main_risk: workflow logic becomes hard to see
8review_question: Where should the complexity live so operators and developers can still understand the process?

What matters is not perfect theoretical purity. What matters is whether the workflow remains explainable.

Design Review Question

A team has a six-step order workflow with retries, branch logic, timeout handling, and manual review, but it insists on pure choreography because it wants “maximum decoupling.” What is the stronger architectural objection?

The stronger objection is that decoupling is being defined too narrowly. If the workflow has meaningful branching, recovery, and operational oversight needs, pure choreography may scatter the control logic until no one can explain or operate the process confidently. In that case, orchestration may create a better boundary for workflow complexity even if individual services remain independently owned.

Quiz Time

Loading quiz…
Revised on Thursday, April 23, 2026