Glossary/Open Table Formats

Delta Lake

Delta Lake is an open-source storage layer providing ACID transactions, schema governance, and data versioning to data lakes built on cloud object storage.

Delta Lake solves the problem of unreliable data lakes by adding a transactional metadata layer on top of Parquet files. Originally developed by Databricks, Delta Lake maintains a transaction log that records all changes to table data, enabling atomic writes and consistent reads. This design prevents issues like partial writes during failures or lost updates during concurrent operations.

The transaction log, implemented as a series of JSON files, provides complete lineage of all modifications. Delta Lake supports delete and update operations at scale, traditionally difficult in immutable data lakes. Unified batch and streaming workloads can write to the same Delta table concurrently, with transactions ensuring no data loss or corruption.

Schema enforcement is built-in, preventing incompatible data from being written. Time travel functionality allows queries to access historical data versions without maintaining separate copies. The format is widely adopted in the Databricks ecosystem and compatible with Apache Spark, though other engines have added support through community initiatives.

Key Characteristics

  • Maintains a transaction log for all table modifications
  • Enforces schema constraints before writing data
  • Supports ACID-compliant delete and update operations
  • Enables concurrent batch and streaming writes to the same table
  • Provides data versioning and point-in-time recovery
  • Optimized for cloud object storage with efficient list operations

Why It Matters

  • Eliminates costly data quality issues from incomplete writes and race conditions
  • Unifies batch and streaming pipelines without custom conflict resolution
  • Reduces storage overhead through efficient compaction and cleanup
  • Supports GDPR and compliance requirements through delete and data governance
  • Improves query performance through statistics and file pruning
  • Enables fine-grained audit trails for regulatory compliance

Example

`
-- Create a Delta table
CREATE TABLE customer_transactions USING delta AS
SELECT * FROM parquet.s3://data-lake/raw-transactions/;

-- ACID update operation
UPDATE customer_transactions 
SET status = 'processed' 
WHERE process_date < current_date();

-- Time travel to yesterday's snapshot
SELECT * FROM customer_transactions@v123;

-- Schema enforcement prevents bad data
INSERT INTO customer_transactions VALUES (null, ...); -- Fails if nullable=false
`

Coginiti Perspective

Coginiti connects to Delta Lake through its Databricks connector with full CoginitiScript support, meaning teams can develop, test, and publish transformations against Delta tables using the same governed workflow they use for warehouse-based analytics. CoginitiScript's incremental publication strategies (append, merge, merge_conditionally) align with Delta Lake's transactional capabilities, ensuring that governed updates are applied atomically.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.