Duplicates and Redelivery

A practical lesson on why duplicates appear in event systems, how redelivery happens under normal failure handling, and what engineers should treat as expected behavior rather than as a rare anomaly.

Duplicates are a normal consequence of building reliable event-driven systems under uncertainty. A broker may redeliver because an acknowledgement was lost. A consumer may crash after performing work but before recording completion. A producer may retry publication after a timeout and not know whether the first attempt succeeded. None of this automatically means the system is broken. It means the system is distributed.

Teams get into trouble when they treat duplicate delivery as a rare edge case instead of a first-class operating condition. Once retries, failover, replay, and consumer restarts enter the design, duplicates should be assumed. The real engineering question becomes how duplicates are recognized, tolerated, or collapsed before they create duplicate business effects.

    sequenceDiagram
	    participant P as Producer
	    participant B as Broker
	    participant C as Consumer
	
	    P->>B: Publish event
	    B-->>C: Deliver event
	    C->>C: Perform work
	    Note over C,B: Ack lost or consumer crashes
	    B-->>C: Redeliver same event

What to notice:

  • duplicate delivery often comes from uncertainty, not from obvious defects
  • the second delivery may be the safest thing the broker can do
  • the consumer needs a business-safe answer to redelivery

Where Duplicates Come From

Common sources include:

  • producer retries after uncertain publication outcomes
  • consumer failure before acknowledgement
  • acknowledgement loss or timeout
  • replay for projection rebuild or recovery
  • redelivery during rebalance or worker reassignment

These mechanisms are often signs of healthy reliability behavior, not bugs in themselves. The bug appears when the application assumes those mechanisms will never happen.

Delivery Duplicate Versus Business Duplicate

It helps to separate two related but different ideas:

  • a delivery duplicate means the same event is delivered more than once
  • a business duplicate means the same real-world effect happens more than once

The platform may not eliminate delivery duplicates. The application should still aim to prevent business duplicates where they are harmful. Sending two password-reset emails may be tolerable. Creating two invoices or charging a card twice usually is not.

Redelivery and Replay Are Not the Same

Redelivery usually means the transport is trying to complete uncertain handling of a recent event. Replay is more deliberate. A team may replay a stream to rebuild a projection, fix a bug, or backfill downstream state. Both can surface duplicates at the consumer boundary, but replay has a different operational intent and often affects a larger time range.

This is why consumers should not assume “I have seen this once already, so I never need to see it again.” They need a clearer policy:

  • ignore known duplicates
  • apply repeated events safely
  • or treat them as suspicious and route them for investigation
1{
2  "eventId": "evt_441b",
3  "eventName": "invoice.created",
4  "producerId": "billing-service",
5  "occurredAt": "2026-03-23T15:32:00Z"
6}

Stable identifiers like eventId are not the whole solution, but without them duplicate reasoning is much weaker.

Why “No Duplicates” Is Usually the Wrong Promise

Architecturally, “no duplicates” is often the wrong promise because it encourages hidden assumptions. Stronger language is:

  • duplicates may occur
  • the system records stable identifiers
  • consumers define how duplicate effects are prevented or tolerated
  • replay paths are explicit

This changes the engineering mindset from optimism to control. The platform no longer pretends to remove uncertainty. It exposes it and gives the application tools to absorb it.

Common Mistakes

  • treating duplicate delivery as exceptional rather than expected
  • confusing transport duplicate with business duplicate
  • depending on timestamps alone instead of stable identifiers
  • replaying historical events without checking downstream duplicate tolerance
  • assuming ordered partitions automatically remove redelivery risk

Design Review Question

A consumer sends fulfillment instructions to an external warehouse API and assumes duplicates are impossible because the topic preserves partition order. What is the strongest correction?

The strongest correction is that ordering and duplicate safety are separate concerns. The topic may preserve local sequence and still redeliver after crash or acknowledgement uncertainty. The warehouse call therefore still needs idempotent or duplicate-safe handling even if ordered delivery is working correctly.

Quiz Time

Loading quiz…
Revised on Thursday, April 23, 2026