Glossary/Data Storage & Compute

Cloud Data Warehouse

Cloud Data Warehouse is a managed analytics database service hosted in cloud infrastructure, providing elastic scaling, separated compute and storage, and usage-based pricing.

Cloud data warehouses (Snowflake, BigQuery, Redshift) revolutionized analytics by eliminating infrastructure management: no provisioning servers, no capacity planning, no physical security concerns. Cloud warehouses separate storage (how much data is kept) from compute (how much processing power), enabling independent scaling: store terabytes while using small compute for light queries, scale compute temporarily for heavy workloads. Usage-based pricing replaces fixed infrastructure costs: organizations pay for storage consumed and compute-seconds used, not reserved capacity sitting idle.

Cloud data warehouses are built on cloud infrastructure: compute and storage are distributed across clusters, queries are parallelized automatically, and infrastructure handles failures transparently. Cloud providers manage security, backups, updates, and scaling. The tradeoff is vendor lock-in: switching providers is difficult because of proprietary SQL extensions, data formats, and feature dependencies.

In practice, cloud warehouses have become the default for analytics: they're cheaper than on-premises for most organizations, require less operational effort, and scale elastically to handle growth. Organizations consolidate multiple systems (data marts, operational data stores) into single cloud warehouses, simplifying architecture and reducing costs.

Key Characteristics

  • Hosted in cloud infrastructure (AWS, GCP, Azure)
  • Separates compute and storage for independent scaling
  • Usage-based pricing for cost efficiency
  • Automatic scaling up and down based on workload
  • Fully managed: backups, updates, security handled by provider
  • Multi-user concurrency with query isolation

Why It Matters

  • Reduces capital costs by replacing purchased infrastructure
  • Reduces operational burden by eliminating infrastructure management
  • Enables elasticity: scale up for heavy workloads, down for light usage
  • Reduces total cost of ownership through usage-based pricing
  • Enables rapid deployment without waiting for infrastructure provisioning
  • Scales globally: access data from any cloud region

Example

A SaaS company uses Snowflake cloud warehouse: data ingests from customers' applications, scales from daily 1GB to peak 500GB per day. Snowflake stores all data in S3 (cheap, durable), scales compute medium-sized cluster for daily transformation jobs, scales compute large cluster for monthly reporting, scales compute to extra-large for ad-hoc analyst queries. Company pays for storage (stable) and compute-seconds (varies with workload). On-premises infrastructure would have required provisioning for peak capacity (expensive, mostly idle).

Coginiti Perspective

Coginiti's ELT approach leans on cloud data warehouses as the primary compute engine for transformations. CoginitiScript's query tags let teams annotate queries with department, project, and priority metadata that flows through to Snowflake's query_tag, BigQuery's query_label, and Redshift's query_group, enabling cost allocation and workload monitoring at the warehouse level. This means governance extends from the analytics catalog into the warehouse's own observability tools.

Related Concepts

Data WarehouseCloud ComputingAnalytics DatabaseQuery EngineCompute WarehouseServerless ComputeDistributed ComputeSeparation of Compute and Storage

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.