Glossary/Data Storage & Compute

Predicate Pushdown

Predicate Pushdown is a query optimization technique that moves filter conditions (WHERE clauses) as close as possible to data sources, reducing the volume of data that must be processed.

Predicate pushdown applies WHERE conditions at the earliest possible point: if a query selects customers where age > 30, pushing the predicate down means applying the filter at the table scan (only read rows matching age > 30) rather than reading all customers then filtering. This dramatically reduces data movement: a 1 billion row customer table filtered to 100 million matching rows avoids moving 900 million unnecessary rows. Pushdown is especially valuable in distributed systems: filtering at the source node reduces network traffic between nodes and reduces memory requirements.

Predicate pushdown is particularly powerful in columnar systems: filters on one column avoid reading other columns entirely. A query filtering on age but selecting name/email reads only age column from storage, never touches name/email columns. Modern query optimizers automatically implement pushdown: users write intuitive WHERE conditions, optimizers push them down automatically.

In practice, predicate pushdown occurs at multiple levels: source database applies filters before sending data, columnar stores apply column filters during scan, distributed systems apply filters at partition level. Effective optimization requires indexes and statistics: if optimizer doesn't know that age > 30 filters to 10% of rows, it may not choose to filter early.

Key Characteristics

  • Moves filter conditions (predicates) to earliest point in execution
  • Reduces data volume processed by downstream steps
  • Applies at table scan when possible
  • Works with indexes and statistics for optimization
  • Especially valuable in columnar and distributed systems
  • Reduces network traffic and memory usage

Why It Matters

  • Dramatically reduces query latency by processing fewer rows
  • Reduces network bandwidth in distributed systems
  • Reduces memory requirements
  • Reduces CPU usage through less data processing
  • Improves scalability for large datasets
  • Is automatically handled by modern query optimizers

Example

Query "SELECT customer_name, email FROM customers WHERE age > 30 AND region='west'" on billion-row customer table: without pushdown, read all billion rows to memory, filter to matching rows (expensive). With pushdown, apply filters at table scan: read only age and region columns, apply filters during read (only 50 million matching rows), project to name and email columns. Reduction from billion rows to 50 million rows processed means query runs 20x faster and uses 1/20th the memory.

Coginiti Perspective

Coginiti's semantic layer translates Semantic SQL WHERE clauses into platform-native filters that the target engine can push down to the storage layer. Because CoginitiScript generates SQL that executes on the target platform's engine, predicate pushdown is handled by each platform's optimizer (Snowflake, BigQuery, Redshift, etc.) rather than by Coginiti itself. The semantic model's dimension definitions and relationship joins are structured to preserve pushdown opportunities in the generated SQL.

Related Concepts

Query OptimizationWHERE ClauseFilter PushdownColumnar StorageDistributed QueryExecution PlanIndexStatistics

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.