Glossary/Data Storage & Compute

Query Federation

Query Federation is a database capability that executes queries across multiple heterogeneous data sources, transparently joining and aggregating data from different systems.

Query federation enables a single query to access data across many sources without copying data: a federated query can join PostgreSQL tables with S3 files and Snowflake data in a single query. The federated query engine coordinates execution: determine which data lives where, push predicates to each source (filter at source when possible), fetch required data, combine results. Execution plan might push down SELECT and WHERE to database, handle JOIN at query level, combine with S3 data. The goal is minimizing data movement: execute as much computation as possible at each source.

Query federation became practical with query engines like Trino and Presto that support multiple connectors. Traditional federated databases required explicit setup; modern approaches dynamically connect to sources. Trade-offs include: queries may be slower than querying single system (network overhead, lack of cross-source optimization), dependency on all sources being available (one slow source slows entire query).

In practice, federation is useful for specific scenarios: accessing rarely-used data in specialized systems, connecting to third-party data without replicating, or joining operational and analytical data. For frequently-accessed data, replication is usually more efficient than federation.

Key Characteristics

  • Executes queries across multiple heterogeneous sources
  • Transparently joins data from different systems
  • Pushes computation to sources when possible
  • Handles schema mapping between sources
  • Minimizes data movement through predicate pushdown
  • Requires source availability during query execution

Why It Matters

  • Enables queries across diverse systems without consolidation
  • Reduces latency and cost for rarely-accessed data
  • Maintains single view of distributed data
  • Simplifies integration of siloed systems
  • Reduces need for centralized data copies
  • Enables real-time access to operational systems

Example

Insurance company federated query: customer data in Salesforce, policies in an on-premises database, claims history in S3 data lake, and payment info in Stripe. Federated query "find high-value customers with multiple claims in last year" runs: Salesforce connector filters customers by value, on-premises connector filters policies by customer, S3 connector aggregates claims by customer (reads only 2024 data), Stripe connector filters customers with active payments. Results join at query level. Without federation, would require replicating all data to central warehouse.

Coginiti Perspective

Coginiti provides a form of governed federation through its 24+ native connectors and semantic layer. Rather than relying on a federation engine to join data across systems at query time, Coginiti lets teams develop and govern analytics logic against each platform independently, with the semantic layer providing consistent definitions across all connected sources. For platforms that support native federation (Trino, Athena, BigQuery with external tables), CoginitiScript works directly with their federated query capabilities.

Related Concepts

Data VirtualizationQuery EngineData IntegrationDistributed QueryHeterogeneous DataSchema MappingPredicate PushdownMulti-Database Query

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.