Glossary/OLTP, OLAP & Workload Types

OLAP (Online Analytical Processing)

OLAP is a database workload class optimized for rapid execution of complex queries that aggregate and analyze large datasets across multiple dimensions.

OLAP systems complement OLTP by specializing in analytical queries that aggregate millions of rows across business dimensions. Rather than answering "What is order #12345?", OLAP answers "What is total revenue by region and product for Q2 2024?" This requires scanning billions of rows, computing aggregates, and joining across dimension tables.

OLAP systems use columnar storage, storing each column separately instead of grouping columns per row. When aggregating revenue by region, columnar storage reads only the revenue and region columns, ignoring others. This dramatically reduces I/O compared to row-oriented systems. Compression algorithms exploit column similarities, further reducing storage and bandwidth.

Data is organized in star or snowflake schemas, separating slowly-changing dimensions (products, customers, dates) from frequently-changing facts (sales, events). Denormalization trades storage for query speed, pre-computing common aggregates. OLAP systems like Snowflake, BigQuery, and Redshift use distributed processing, parallel I/O, and query optimization to handle analytics on terabyte datasets in seconds.

Key Characteristics

  • Optimize for complex queries aggregating large datasets
  • Use columnar storage for efficient scanning and compression
  • Support denormalized schemas (star/snowflake) for performance
  • Scan millions to billions of rows per query
  • Tolerate seconds to minutes of latency
  • Enable distributed parallel processing for scalability

Why It Matters

  • Enable business analytics on large historical datasets
  • Reduce query latency from hours to seconds through architectural optimization
  • Support interactive dashboards and reports on fresh data
  • Scale cost-effectively by processing massive datasets
  • Provide insights from complete historical data, not samples
  • Enable sophisticated analysis (cohorts, trends, segments)

Example

`
-- OLAP query: Revenue analysis by multiple dimensions

SELECT 
  d.year,
  d.quarter,
  p.product_category,
  r.region,
  SUM(f.revenue) as total_revenue,
  COUNT(DISTINCT f.customer_id) as unique_customers,
  AVG(f.revenue) as avg_transaction_value
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_product p ON f.product_id = p.product_id
JOIN dim_region r ON f.region_id = r.region_id
WHERE d.year >= 2023
GROUP BY d.year, d.quarter, p.product_category, r.region
ORDER BY total_revenue DESC;

-- Columnar storage benefit:
-- Query only reads: revenue column, date columns, product columns, region columns
-- Ignores: customer detail, transaction ID, payment method, etc.
-- 100 billion rows: 50 TB in row-oriented, 5 TB in columnar

-- Distributed execution:
-- Node 1: Process 2023 data
-- Node 2: Process 2024 data
-- Nodes combine and return results
`

Coginiti Perspective

Coginiti targets OLAP workloads directly. The SMDL semantic layer models OLAP concepts natively: entities map to tables, dimensions define the axes of analysis, and measures with 12 aggregation types (including median, stdev, and variance) encode business calculations once. Semantic SQL's MEASURE() function ensures correct aggregation at any dimensional grain, and the Apache DataFusion engine translates queries to platform-specific SQL for Snowflake, BigQuery, Redshift, and other OLAP-optimized systems.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.