Glossary/Data Governance & Quality

Data Observability

Data observability is the capability to monitor data system health and quality, detect anomalies, and diagnose root causes using data about data freshness, completeness, distribution, and lineage.

Data observability extends traditional system observability to data. While systems observability monitors CPU, memory, and latency, data observability monitors data quality, freshness, and anomalies. It answers: Is my data up to date? Are there unexpected gaps or spikes? Did quality degrade? Data observability combines data quality metrics (null rates, duplicates, outliers), operational metrics (freshness, latency), and lineage to enable diagnosis.

Data observability emerged from the realization that data systems can appear healthy (pipelines running, no errors) while data quality degrades invisibly. A pipeline might complete successfully but deliver duplicate rows, null columns, or stale data. Data observability makes these issues visible and actionable. When anomalies occur, observability systems help diagnose root causes: did an upstream table change? Did a data quality rule fail? Did schema change?

Data observability platforms monitor data continuously, establish baselines (what's normal), and alert when deviations occur. They use techniques like statistical anomaly detection (when is a value unusual?) and rule-based validation (are nulls within acceptable range?). Better observability systems correlate metrics: when revenue metric spikes unexpectedly, observability correlates the spike with upstream changes to identify root causes.

Key Characteristics

  • Continuously monitors data quality and freshness
  • Detects anomalies based on baselines or rules
  • Traces root causes using lineage and metadata
  • Combines descriptive and diagnostic information
  • Enables proactive issue detection
  • Integrates with alerting and incident systems

Why It Matters

  • Reliability: Early detection prevents bad data from affecting decisions
  • Trust: Users know data is monitored and issues are caught
  • Efficiency: Automatic anomaly detection replaces manual monitoring
  • Diagnosis: Lineage-aware diagnosis speeds issue resolution
  • Governance: Validates that quality and freshness SLAs are met

Example

Data observability detects that a revenue metric increases 20% unexpectedly. It correlates with: a product table update (new product added), no changes to upstream transformations, and a slight increase in order volume (5%). The diagnosis: the 20% increase is explained by the new product accounting for 15% of revenue, real growth of 5%. This correlation analysis would be impossible without observability.

Coginiti Perspective

Coginiti contributes to data observability through several mechanisms. Query tags on Snowflake, BigQuery, and Redshift allow teams to monitor execution patterns, costs, and performance by project or department. CoginitiScript #+test blocks can assert data freshness, completeness, and value distribution as part of pipeline runs. Publication lifecycle hooks (beforeAll, beforeEach, afterEach, afterAll) provide instrumentation points where observability checks can be embedded directly into data pipeline execution.

Related Concepts

Data QualityData Observability PlatformsAnomaly DetectionData LineageOperational MetadataData ContractsData SLAsIncident Management

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.