Reference Data, Derived Data, and Read Models

March 22, 2026

A practical lesson on how to distinguish authoritative data from copied, derived, and query-optimized data so duplication can be judged correctly.

Reference data, derived data, and read models are important distinctions because teams often label all duplication as either “bad redundancy” or “completely fine.” Neither reaction is precise enough. In distributed systems, some duplication is necessary and healthy. The real question is what kind of duplication is happening and whether the authoritative boundary is still clear.

The most useful categories are:

authoritative data: the source of truth
reference data: copied for lookup or enrichment
derived data: computed from authoritative facts or events
read models: projections optimized for query or reporting

These categories help teams discuss duplication without blurring authority.

    flowchart TD
	    A["Authoritative service"] --> B["Reference copies"]
	    A --> C["Derived calculations"]
	    A --> D["Read models and projections"]
	    B --> E["Not authoritative"]
	    C --> E
	    D --> E

Reference Data

Reference data is copied because another service needs local context or enrichment but does not own the underlying truth. Common examples include tenant display names, product titles, or currency metadata copied into another system for local use.

Reference data is usually safe when:

the authoritative owner is explicit
update expectations are clear
the consumer does not start editing the canonical meaning

Derived Data

Derived data is computed from authoritative facts. Totals, summary views, risk scores, and analytics metrics often fall into this category. Derived data can be extremely useful, but teams should remember that it is downstream truth, not upstream authority.

If the derivation needs correction, the review question should usually be, “Was the source event or source state wrong?” not “Should we patch the derived table directly?”

Read Models

Read models are projections built to serve queries well. They are common in event-driven and service-oriented systems because they reduce cross-service query chains and let teams shape query storage for specific use cases. A read model can be local to one service or shared for reporting.

The important warning is that read models should not silently become write paths for canonical state.

The event below is a safe kind of input for a read model.

1{
2  "event": "OrderPlaced",
3  "orderId": "ord_1042",
4  "tenantId": "t_17",
5  "totalAmount": 199.50,
6  "currency": "USD",
7  "placedAt": "2026-03-22T14:12:00Z"
8}

From this event, a reporting system can build timeline views, revenue summaries, and tenant dashboards. It should not redefine whether the order was validly placed.

A Small Classification Table

 1data_classification:
 2  shipping_address_copy:
 3    type: reference_data
 4    authoritative_owner: customer-profile
 5  daily_revenue_total:
 6    type: derived_data
 7    authoritative_owner: billing-events
 8  order_reporting_projection:
 9    type: read_model
10    authoritative_owner: checkout-and-billing-events

What this demonstrates:

classification makes duplication reviewable
not every copy is a boundary problem
the authoritative owner should still be explicit for downstream data

Common Mistakes

treating any duplicated data as automatically wrong
treating every read model as safe even after teams start editing it directly
forgetting to name the authoritative owner of reference or derived data
letting analytics or reporting stores become hidden write systems

The point is not to avoid all copies. The point is to stop vague duplication from becoming vague ownership.

Design Review Question

A team says its reporting database is “just a read model,” but operations staff regularly fix customer-visible order states there because it is the fastest place to update dashboards. Is it still just a read model?

No. The stronger answer is that the reporting store has become a hidden operational write path. Once that happens, the architecture has lost its distinction between projection and authority.

Quiz Time

Loading quiz…

Revised on Wednesday, June 3, 2026

6.2 Source of Truth

6.4 Data Gravity and Boundary Pressure