Describe the pattern of using managed workflow/orchestration services to coordinate retries, branching, waiting, and human review. This section should explain why explicit workflow tools matter.
Workflow engines give serverless systems a durable control plane for multi-step processes. Instead of encoding retries, waits, branches, and human approval logic inside one function or across a chain of loosely coordinated event handlers, a workflow engine stores the current state of the process explicitly and decides what should run next.
The label “step functions” is often used because several platforms expose workflow engines as a sequence of named states or steps. The underlying idea is broader and vendor-neutral: make workflow progress visible, durable, and inspectable. That is the difference between a process that survives retries cleanly and one that becomes impossible to reason about after the first operational incident.
flowchart LR
A["Start workflow"] --> B["Validate request"]
B --> C{"Manual review needed?"}
C -->|No| D["Charge payment"]
C -->|Yes| E["Wait for approval"]
E --> D
D --> F{"Payment succeeded?"}
F -->|Yes| G["Create shipment"]
F -->|No| H["Record failure"]
G --> I["Complete"]
What to notice:
Workflow engines are strong when a process needs any combination of:
Without an explicit workflow layer, teams often end up building a fragile coordinator function that:
That is usually not simplicity. It is hidden orchestration.
The point of a workflow engine is not to move all business logic into a giant declarative file. It is to keep coordination logic separate from task logic. Each step should still do one bounded thing. The workflow layer is responsible for sequencing, branching, retry policy, waits, and visibility.
1workflow:
2 name: order-approval
3 start_at: validate-order
4 states:
5 validate-order:
6 type: task
7 next: needs-review
8 needs-review:
9 type: choice
10 when:
11 requiresManualReview: wait-for-review
12 default: charge-payment
13 wait-for-review:
14 type: wait_for_event
15 next: charge-payment
16 charge-payment:
17 type: task
18 retry:
19 attempts: 3
20 backoff: exponential
21 next: create-shipment
22 create-shipment:
23 type: task
24 end: true
What this demonstrates:
The most common failure mode is to confuse a workflow engine with a place to put all business logic. A workflow should coordinate steps, not become a monolithic rules engine full of complicated data transformation. Another mistake is to avoid workflow tooling entirely and write orchestrator code inside one function because it feels faster at the start. That approach often works for the first demo and fails during the first real retry or timeout incident.
A team has one coordinating function that validates a purchase, waits for fraud review, charges a card, and triggers fulfillment by invoking other functions directly. It keeps progress in a few status flags in a database row. The team can no longer explain why some orders are stuck. What should change first?
The stronger answer is to introduce explicit workflow orchestration, not more logs inside the coordinator. The main problem is hidden control flow. A workflow engine would make state transitions, waits, retries, and failure points visible and durable, while the task functions could stay narrow and easier to test.