Compensation Design

March 23, 2026

A practical lesson on compensation as business correction logic, including reversible versus irreversible effects, timing choices, and common compensation mistakes.

On this page

Compensation is the logic that corrects a workflow after one or more earlier steps already committed successfully and a later step fails. In event-driven systems, compensation is often described casually as rollback, but that word is misleading. A database rollback restores an earlier technical state inside one transactional boundary. Compensation in business systems usually means a new business action that offsets or repairs the earlier outcome.

That distinction matters because not every effect is truly reversible. Releasing inventory may be straightforward. Refunding a payment is not the same as never authorizing it. Sending an email may be impossible to undo. Provisioning access may require explicit removal plus audit. Good compensation design therefore starts from business meaning, not from transaction vocabulary alone.

    flowchart TD
	    A["Workflow step succeeds"] --> B["Later step fails"]
	    B --> C{"Can earlier effect be reversed exactly?"}
	    C -->|Yes| D["Run direct compensation"]
	    C -->|No| E["Run offset, correction, or manual recovery path"]

What to notice:

compensation begins after partial success already exists
some actions can be reversed cleanly, others can only be corrected
business semantics matter more than technical symmetry

Reversible Versus Irreversible Effects

One of the first compensation questions is whether the earlier effect is:

directly reversible
only partially reversible
not reversible at all

Examples:

inventory reservation can often be released directly
payment authorization may be voided if timing permits, or refunded if capture already happened
a sent customer email cannot be unsent, though a follow-up correction may be possible
a partner webhook may require a second explicit correction message

This is why compensation should be designed per step, not assumed as a generic platform capability.

Compensation Needs Timing Rules

The architecture also needs to decide when compensation should happen:

immediately after a dependent failure
after bounded retries are exhausted
after human review
as a later asynchronous cleanup job

The right answer depends on business risk. Immediate compensation may be safest for inventory holds. Human review may be required for payments, entitlements, or high-value partner actions. A good workflow model makes that timing explicit.

1compensationPolicy:
2  workflow: order-fulfillment
3  steps:
4    reserve_inventory:
5      onFailureLater: release_immediately
6    authorize_payment:
7      onFailureLater: void_if_possible_else_refund
8    send_confirmation_email:
9      onFailureLater: no_direct_undo

Idempotency Matters for Compensation Too

Compensation is not exempt from the reliability rules of distributed systems. Compensation handlers can also be retried, duplicated, or replayed. That means compensating actions should themselves be designed to be safe under repetition.

If a workflow may retry release inventory, void payment, or remove access, those actions need:

stable correlation to the original workflow step
duplicate-safe effect boundaries
durable state showing whether compensation already completed

Otherwise a recovery path can create a second class of inconsistency while trying to fix the first.

Manual Recovery Is Sometimes the Right Design

Architects sometimes treat manual intervention as failure of the design. That is too simplistic. Some workflows involve:

irreversible partner-side actions
legal or financial controls
ambiguous business state
customer-visible consequences that need case review

In those cases, “pause and escalate” can be the correct compensation strategy. The mistake is not using manual recovery. The mistake is pretending every business error can be auto-reversed safely.

1type CompensationState = {
2  workflowId: string;
3  failedStep: string;
4  compensationStatus: "pending" | "completed" | "manual_review";
5  nextAction: string;
6};

This kind of state model is valuable because it makes recovery an explicit part of the workflow rather than a hidden operator ritual.

Common Mistakes

treating compensation as if it were technical rollback
assuming every action has a clean inverse
forgetting to make compensating actions idempotent
triggering automatic compensation where human review is actually required
failing to document which prior steps must be corrected after each failure point

Design Review Question

A team says their compensation policy is simple: “if any later step fails, just roll everything back.” Why is that a weak design statement?

It is weak because business actions are not all technically reversible. Some steps can be undone directly, some require offsetting correction, and some can only move into reviewed exception handling. The stronger design identifies compensation behavior per step and per failure mode rather than assuming one universal rollback model.

Quiz Time

Loading quiz…

Revised on Thursday, April 23, 2026

11.3 Sagas