Glossary/Data Governance & Quality

Schema Validation

Schema validation is automated verification that data conforms to expected structure, including column names, data types, nullability, and constraints.

Schema validation checks that incoming data matches expected structure: required columns exist, data types are correct, null constraints are enforced, and relationships are valid. For example, a schema says the customers table has columns (customer_id: integer, name: string, created_at: timestamp), and validation ensures all rows conform. Schema validation catches structural errors: missing columns, type mismatches (string instead of integer), or unexpected columns.

Schema validation emerged because data structures change and break downstream systems. A source system adds a column, a pipeline developer removes a column by accident, or a schema migration adds a constraint. Without validation, these changes propagate downstream undetected, causing query failures or silent data corruption. Schema validation detects changes immediately.

Schema validation includes both strict validation (changes must be approved) and flexible validation (new columns allowed, missing columns flagged). It can be enforced at different stages: at ingestion (reject data that doesn't match schema), during transformation (enforce that outputs match expected schema), or continuously in production (monitor for schema drift). Tools like Great Expectations, dbt, and data quality platforms offer schema validation capabilities.

Key Characteristics

  • Validates column names, types, and constraints
  • Detects schema changes and drift
  • Can enforce strict or flexible schema requirements
  • Runs at ingestion, transformation, or production stages
  • Integrates with data quality platforms
  • Prevents propagation of structural errors

Why It Matters

  • Stability: Detects breaking schema changes early
  • Reliability: Prevents queries from failing due to missing columns
  • Clarity: Schema definitions document structure
  • Automation: Validates without manual inspection
  • Governance: Enforces change control for schemas

Example

Schema validation for orders table expects: order_id (integer, non-null), customer_id (integer, non-null, FK to customers), amount (decimal, non-null), created_at (timestamp, non-null). If a data load adds a column (new_field: string) or removes one (amount missing), validation flags the discrepancy.

Coginiti Perspective

SMDL provides a form of schema validation at the semantic layer. Entity definitions declare expected dimensions with specific types (text, number, date, datetime, bool), and Semantic SQL enforces type compatibility at query time. CoginitiScript #+test blocks can validate physical schemas by querying information_schema or system catalogs to assert expected columns, types, and constraints exist before pipeline logic executes. Publication with schema specification ensures materialized outputs conform to declared structure.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.