Glossary/Open Table Formats

Apache Iceberg

Apache Iceberg is an open-source table format that organizes data files with a metadata layer enabling ACID transactions, schema evolution, and time travel capabilities for data lakes.

Apache Iceberg addresses limitations of traditional data lake architectures where data files lack transactional guarantees and metadata management. The format separates data files from metadata, maintaining a manifest of file references that ensures consistency across concurrent reads and writes. This architecture enables atomic updates at the table level, preventing partial writes or corrupted reads during failures.

Iceberg's metadata system tracks snapshots, allowing queries to reference data as it existed at specific points in time. Schema evolution is handled through a versioned schema registry, permitting structural changes without rewriting existing data. The format supports hidden partitioning, which partitions data logically without encoding partition values in file paths, improving query efficiency and simplifying maintenance.

The open-source nature and adoption by platforms like Apache Spark, Flink, and Trino has made Iceberg a standard for building reliable, queryable data lakes that compete with traditional data warehouse capabilities.

Key Characteristics

  • Supports ACID transactions across distributed reads and writes
  • Maintains manifest files tracking all active and historical data files
  • Enables point-in-time query access to any previous snapshot
  • Handles schema evolution without breaking existing queries
  • Implements hidden partitioning for optimized query planning
  • Operates independently of compute engines through open file formats

Why It Matters

  • Eliminates data corruption risks inherent in traditional data lakes
  • Reduces query costs by avoiding full table scans through partitioning optimization
  • Enables time travel for audit trails, testing, and data recovery without copying data
  • Supports schema changes during production without downtime or ETL rebuilds
  • Ensures correctness in concurrent analytics and data operations
  • Provides portability across multiple compute engines

Example

`
-- Create an Iceberg table
CREATE TABLE sales_data (
  order_id INT,
  product_id INT,
  amount DECIMAL(10, 2),
  order_date DATE
) USING iceberg
PARTITIONED BY (month(order_date));

-- Time travel query
SELECT * FROM sales_data VERSION AS OF 12345;

-- Schema evolution
ALTER TABLE sales_data ADD COLUMN customer_segment STRING;
`

Coginiti Perspective

Coginiti has direct Iceberg support through CoginitiScript publication. Teams can materialize transformation results as Iceberg tables on Snowflake, Databricks, BigQuery, Trino, and Athena, producing open table format outputs from governed, version-controlled logic. This means Iceberg tables created through Coginiti carry the same semantic governance as any other publication target, and any engine that reads Iceberg can consume the output independently.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.