Glossary/Open Table Formats

Schema Evolution

Schema evolution is the capability to add, remove, or modify columns in a table without rewriting existing data or breaking downstream queries.

Traditional data warehouses require complex migration processes to alter table schemas, often involving downtime and full data rewriting. Schema evolution solves this problem through versioned schemas and metadata-driven query interpretation. When a column is added, existing records lack that column value, but the system handles null values transparently. When a column is removed, the metadata tracks which columns are available at which versions, allowing old data to remain intact while new writes use the updated structure.

Modern table formats implement schema evolution through schema versioning in metadata layers. Each table maintains a schema version history with timestamps or version numbers. Queries automatically adapt to available columns based on the snapshot being read. Writers can add columns with defaults, and readers see those defaults when accessing earlier snapshots. This approach is far more efficient than rewriting all data files.

Schema evolution is critical for long-lived analytical systems where requirements change. Data sources add new fields, business logic requires new dimensions, and regulatory changes demand new tracking columns. Supporting evolution reduces operational burden and enables rapid iteration on data models without staging periods or migration windows.

Key Characteristics

▶Add columns without rewriting existing data files
▶Remove or rename columns with transparent handling for older data
▶Change column data types with validation and conversion rules
▶Maintain query compatibility across schema versions
▶Track schema history for regulatory and debugging purposes
▶Apply defaults to new columns when reading historical data

Why It Matters

▶Eliminates downtime and disruption from schema changes
▶Reduces data engineering overhead for model updates and new requirements
▶Supports agile analytics development with rapid iteration on schemas
▶Enables clean column additions without manual null-filling operations
▶Simplifies compliance changes like adding audit columns without data migration
▶Improves data quality by enforcing nullable constraints only on new data

Example

`
-- Version 1: Original schema
CREATE TABLE events (
  event_id INT,
  user_id INT,
  timestamp TIMESTAMP
)

-- Time passes, new requirement arrives
-- Version 2: Add column (no rewrite)
ALTER TABLE events ADD COLUMN event_type STRING;

-- Queries work against current schema
SELECT * FROM events WHERE event_type = 'login';

-- Queries against old snapshot see NULL for new column
SELECT * FROM events VERSION AS OF 'v1'
-- event_type column returns NULL for all rows

-- Version 3: Add required column with default
ALTER TABLE events ADD COLUMN region STRING DEFAULT 'US';
`

Coginiti Perspective

When underlying table schemas evolve, Coginiti's semantic layer provides a stability buffer. SMDL entity definitions map physical columns to business dimensions and measures, so adding a column to a source table does not break existing semantic queries. The semantic model can be updated to expose new columns when ready, and CoginitiScript's #+test blocks can validate that schema changes do not introduce data quality regressions before the semantic model is updated.

Related Concepts

Apache Iceberg Delta Lake Table Metadata LayerACID TransactionsTime Travel (Data)Data Governance Metadata Management

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.

Request a Demo

Schema Evolution

Key Characteristics

Why It Matters

Example

Coginiti Perspective

Related Concepts

More in Open Table Formats

Apache Hudi

Apache Iceberg

Data Compaction

Delta Lake

Hidden Partitioning

Open Table Format

See Semantic Intelligence in Action