A practical lesson on saga-based workflow coordination, the trade-offs between orchestration and choreography, and how to choose a coordination style that matches workflow complexity.
A saga is a way to coordinate a long-running business workflow across several services without pretending the system can do one global rollback. Instead of one giant transaction, the workflow becomes a sequence of local transactions with explicit follow-up steps and explicit recovery behavior. This makes sagas one of the main design tools for service boundaries. They allow a workflow to cross several owners while still preserving understandable state transitions.
The important point is that a saga is not magic. It is a coordination style. If the workflow is badly decomposed, a saga will not fix the underlying boundary problem. It will only make the coordination logic more visible.
flowchart LR
O["Order created"] --> P["Authorize payment"]
P --> I["Reserve inventory"]
I --> C["Confirm order"]
I --> F["Compensate payment"]
P --> R["Mark order failed"]
The diagram matters because it shows the workflow as a series of durable steps instead of one invisible cross-service transaction.
In a saga, each service:
This means the workflow has to track where it is. That state may live in:
The details vary, but the core idea stays the same: workflow state becomes explicit.
In orchestration, one component decides what happens next. It sends commands such as:
This has several benefits:
The trade-off is that the orchestrator becomes a first-class component with real responsibility. If teams are careless, it can turn into a logic-heavy control tower that knows too much about every domain.
1{
2 "sagaId": "saga_901",
3 "workflow": "place_order",
4 "currentStep": "reserve_inventory",
5 "status": "WAITING",
6 "nextOnSuccess": "confirm_order",
7 "nextOnFailure": "refund_payment"
8}
What this demonstrates:
In choreography, no single coordinator directs the whole workflow. Services react to business events:
OrderPlacedPaymentAuthorizedInventoryReservedThis can reduce central control logic and allow services to stay loosely coupled at the workflow level. It works well when:
The main risk is that the workflow logic becomes hard to see. What looks decoupled in code can become hard to debug in production if the path from cause to effect is scattered across many listeners.
The question is not which pattern is more fashionable. The question is which pattern makes the workflow easier to understand, test, and recover.
Orchestration is often stronger when:
Choreography is often stronger when:
This is why the workflow model and the decomposition model influence each other. If the business flow is too complicated for event-only reasoning, forcing choreography can hide coupling rather than reduce it.
Teams sometimes say, “We will just publish events and everything else will subscribe.” That sounds loose and scalable, but it often avoids the harder question: who actually owns the workflow outcome?
Warning signs include:
That is not event-driven elegance. It is coordination ambiguity.
1workflow: place-order
2orchestration:
3 strength: explicit control flow
4 main_risk: coordinator can become too central
5choreography:
6 strength: local autonomy around domain events
7 main_risk: workflow logic becomes hard to see
8review_question: Where should the complexity live so operators and developers can still understand the process?
What matters is not perfect theoretical purity. What matters is whether the workflow remains explainable.
A team has a six-step order workflow with retries, branch logic, timeout handling, and manual review, but it insists on pure choreography because it wants “maximum decoupling.” What is the stronger architectural objection?
The stronger objection is that decoupling is being defined too narrowly. If the workflow has meaningful branching, recovery, and operational oversight needs, pure choreography may scatter the control logic until no one can explain or operate the process confidently. In that case, orchestration may create a better boundary for workflow complexity even if individual services remain independently owned.