Glossary/Data Governance & Quality

Data Validation

Data validation is the automated checking of data against rules to ensure it meets quality standards, catching errors before they propagate to downstream consumers.

Data validation runs automated tests on data: checking that columns are not null when required, that values fall within expected ranges, that formats are correct, and that relationships between tables are valid. For example, a schema says the customers table has columns (customer_id: integer, name: string, created_at: timestamp), and validation ensures all rows conform. Validation that fails can either raise alerts or automatically remediate (blocking data load, marking data as invalid, or filling with defaults).

Data validation emerged because manual quality checks don't scale. With thousands of tables and daily refreshes, checking quality manually is impossible. Automated validation runs tests continuously, catching issues quickly. Early detection prevents bad data from propagating: if a validation fails at data ingestion, the bad data never reaches downstream consumers.

Validation rules are typically defined using domain knowledge: analysts and engineers understand what valid data looks like. Rules might be simple (customer_id must be non-null) or complex (if order status is shipped, shipping_date must equal actual ship date). Some validation rules are business-specific (subscription start date cannot be after contract end date); others are structural (foreign key references must exist). Modern data platforms support declarative validation: writing rules in YAML or configurations rather than custom code.

Key Characteristics

  • Automated checks against defined rules
  • Runs at ingestion, transformation, or production stages
  • Detects issues before they propagate
  • Supports simple and complex validation logic
  • Tracks failure metrics and trends
  • Integrates with data quality platforms

Why It Matters

  • Prevention: Catches errors early before propagation
  • Confidence: Regular validation builds trust in data
  • Speed: Automated tests replace manual checking
  • Compliance: Validates conformance to regulations
  • Cost: Early detection prevents expensive downstream failures

Example

Validation rules for an orders table: (1) order_id is non-null and unique, (2) customer_id exists in customers table, (3) order_amount > 0, (4) order_date <= current_date, (5) if status is shipped, shipping_date is not null. If any rule fails, the data load pauses for investigation.

Coginiti Perspective

CoginitiScript #+test blocks implement data validation rules as SQL queries that return violations. A query returning zero rows validates successfully; rows returned represent validation failures. These validation blocks can be embedded in publication pipelines to check data before or after materialization, using lifecycle hooks (beforeEach, afterEach) to position validation at the right point in the pipeline. SMDL dimension typing provides implicit validation by constraining operations to type-compatible uses.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.