A practical lesson on compensation as business correction logic, including reversible versus irreversible effects, timing choices, and common compensation mistakes.
Compensation is the logic that corrects a workflow after one or more earlier steps already committed successfully and a later step fails. In event-driven systems, compensation is often described casually as rollback, but that word is misleading. A database rollback restores an earlier technical state inside one transactional boundary. Compensation in business systems usually means a new business action that offsets or repairs the earlier outcome.
That distinction matters because not every effect is truly reversible. Releasing inventory may be straightforward. Refunding a payment is not the same as never authorizing it. Sending an email may be impossible to undo. Provisioning access may require explicit removal plus audit. Good compensation design therefore starts from business meaning, not from transaction vocabulary alone.
flowchart TD
A["Workflow step succeeds"] --> B["Later step fails"]
B --> C{"Can earlier effect be reversed exactly?"}
C -->|Yes| D["Run direct compensation"]
C -->|No| E["Run offset, correction, or manual recovery path"]
What to notice:
One of the first compensation questions is whether the earlier effect is:
Examples:
This is why compensation should be designed per step, not assumed as a generic platform capability.
The architecture also needs to decide when compensation should happen:
The right answer depends on business risk. Immediate compensation may be safest for inventory holds. Human review may be required for payments, entitlements, or high-value partner actions. A good workflow model makes that timing explicit.
1compensationPolicy:
2 workflow: order-fulfillment
3 steps:
4 reserve_inventory:
5 onFailureLater: release_immediately
6 authorize_payment:
7 onFailureLater: void_if_possible_else_refund
8 send_confirmation_email:
9 onFailureLater: no_direct_undo
Compensation is not exempt from the reliability rules of distributed systems. Compensation handlers can also be retried, duplicated, or replayed. That means compensating actions should themselves be designed to be safe under repetition.
If a workflow may retry release inventory, void payment, or remove access, those actions need:
Otherwise a recovery path can create a second class of inconsistency while trying to fix the first.
Architects sometimes treat manual intervention as failure of the design. That is too simplistic. Some workflows involve:
In those cases, “pause and escalate” can be the correct compensation strategy. The mistake is not using manual recovery. The mistake is pretending every business error can be auto-reversed safely.
1type CompensationState = {
2 workflowId: string;
3 failedStep: string;
4 compensationStatus: "pending" | "completed" | "manual_review";
5 nextAction: string;
6};
This kind of state model is valuable because it makes recovery an explicit part of the workflow rather than a hidden operator ritual.
A team says their compensation policy is simple: “if any later step fails, just roll everything back.” Why is that a weak design statement?
It is weak because business actions are not all technically reversible. Some steps can be undone directly, some require offsetting correction, and some can only move into reviewed exception handling. The stronger design identifies compensation behavior per step and per failure mode rather than assuming one universal rollback model.