Data Quality and Semantic Observability

March 26, 2026

How to observe correctness, completeness, and semantic drift so data remains trustworthy, not just available.

Data quality and semantic observability is about whether data still means what consumers think it means. Rows can arrive on time and pipelines can stay green while values drift, fields become null unexpectedly, category mappings change, or business definitions silently shift under dashboards and models. These failures are often harder to detect than runtime failures because the system keeps working technically.

That is why data observability needs semantic checks alongside transport and execution checks. Completeness, validity, uniqueness, distribution drift, schema evolution, and business-rule integrity all matter. Different consumers may tolerate different forms of imperfection, but the system should not hide those deviations.

    flowchart TD
	    A["Raw data arrives"] --> B["Schema checks"]
	    B --> C["Quality checks"]
	    C --> D["Business rule checks"]
	    D --> E["Trusted dataset"]
	    C --> F["Quality incident"]
	    D --> F

Semantic Health Needs Explicit Rules

A strong quality model often includes checks such as:

null-rate thresholds on critical columns
uniqueness on primary identifiers
accepted value ranges
cross-table reconciliation
distribution drift on important measures
business-rule checks such as “refunds cannot exceed settled payments”

 1quality_checks:
 2  critical_columns:
 3    order_id:
 4      not_null: true
 5      unique: true
 6    payment_status:
 7      accepted_values: ["authorized", "captured", "refunded"]
 8  business_rules:
 9    - "refund_amount <= captured_amount"
10  drift:
11    - field: order_total
12      compare_to: "last_14_days_distribution"

What to notice:

technical schema checks are only one layer
business-rule checks protect semantic meaning
drift checks help detect issues that are not obvious row-level failures

Data Can Be Present And Still Be Wrong

This is the central semantic observability challenge. Many failures do not look like missing data. They look like plausible but wrong data:

a field changes units
a join key becomes partially empty
a status mapping silently changes upstream
a valid-looking distribution shifts because of a bug

These failures are especially dangerous because they often pass through ordinary runtime monitoring unnoticed.

Design Review Question

If a dashboard keeps updating on schedule but its business totals are wrong because one status mapping changed upstream, what missing observability layer is most responsible?

The stronger answer is semantic data-quality observability. Pipeline movement is visible, but correctness and business meaning are not.

Quiz Time

Loading quiz…

Revised on Wednesday, June 3, 2026

13.1 Pipeline Health and Freshness

13.3 Batch Jobs and Schedulers

Data Quality and Semantic Observability

Semantic Health Needs Explicit Rules

Data Can Be Present And Still Be Wrong

Design Review Question

Quiz Time

Browse Observability Patterns