Data quality monitoring glossary

Plain-language definitions of the terms that come up most often in data quality monitoring and data observability. For a deeper walkthrough, see What is data quality monitoring?.

Data quality

Data quality is the degree to which data is accurate, complete, consistent, timely, and fit for its intended use. High-quality data can be trusted for analytics, reporting, and machine learning; low-quality data quietly produces wrong decisions.

Data quality monitoring

Data quality monitoring is the continuous, automated checking of tables for problems — missing values, stale data, volume anomalies, schema drift, broken rules, and statistical outliers — with alerts raised before bad data reaches downstream consumers.

Data observability

Data observability is the broader practice of understanding the health of an entire data system, typically spanning data quality, pipeline lineage, freshness, and infrastructure metrics. Data quality monitoring is one pillar of data observability.

Anomaly / outlier detection

Anomaly (or outlier) detection identifies values or patterns that deviate significantly from what is expected. Effective detection compares against a learned baseline of a metric's own history rather than a fixed threshold, so it flags genuine deviations instead of normal variation.

Data freshness

Data freshness measures how recently a dataset was updated relative to its expected schedule. A freshness check fails when the newest row is older than it should be — a strong early signal that an upstream pipeline has stalled.

Schema drift

Schema drift is an unannounced change to a table's structure — a column renamed, dropped, retyped, or added upstream — that silently breaks downstream queries and consumers.

Completeness / null checks

Completeness checks verify that columns which should always be populated are not missing or empty. A spike in nulls usually signals a broken join, a partial load, or an upstream change.

Volume / row-count checks

Volume checks watch the number of rows a table or load produces. A sharp drop can mean a partial load; a sharp rise can mean duplicated data. Both are caught by comparing against expected volume.

Distribution drift

Distribution drift is a change in the statistical shape of a column — its mean, spread, or category mix — even when no single value is obviously invalid. It is a common, hard-to-spot cause of degraded model and report quality.

Data contract

A data contract is an agreed specification of the structure, semantics, and quality guarantees of a dataset between its producer and its consumers. Monitoring enforces that the data continues to honor the contract over time.

Data downtime

Data downtime is any period in which data is missing, late, or wrong. Reducing data downtime — detecting and resolving issues quickly — is the practical goal of data quality monitoring.

Data profiling

Data profiling is the analysis of a dataset's structure and statistics — column types, distributions, null rates, cardinality — to understand its content. AIMO profiles tables to generate suitable monitors and to set sensible bounds.

Monitor

A monitor is a single, repeatable check bound to a table and column(s) — for example a null check, a numeric-bounds check, or a cross-column KPI — that runs on a schedule and produces a time series AIMO can evaluate for outliers.