Object Storage
Object Storage is a cloud storage system that manages data as individual, discrete objects with metadata, accessed via HTTP APIs rather than file systems or block storage.
Object storage (S3, Google Cloud Storage, Azure Blob) treats data as individual objects with key names: instead of navigating file hierarchies, you request objects by key. Objects are immutable (replace entire object rather than edit in place) and can store any data (files, structured records, images, video). Object storage uses distributed infrastructure: objects are stored redundantly across geographic regions, offering extreme durability (11 nines) and high availability. Object storage is optimized for throughput and scalability rather than latency: bulk operations (scanning billions of objects) are efficient, but random access is slower than local disk.
Object storage became the foundation of cloud data architecture because it's cheap (cents per terabyte per month), durable, and scales to exabytes. Data lakes, data warehouses, and backup systems all use object storage. Organizations consolidate data in object storage and add compute/analytics layers on top rather than moving data between systems.
In practice, object storage is transparent to analytics users: querying data in S3 through a data warehouse feels like querying local tables. Query engines handle fetching objects. The key insight is separation of storage from compute: data sits in object storage permanently while different compute engines (Spark, Snowflake, Athena) query the same data.
Key Characteristics
- ▶Stores data as individual objects with metadata
- ▶Accessed via HTTP APIs (not file systems)
- ▶Immutable: replace objects rather than edit in place
- ▶Extremely durable through geographic redundancy
- ▶Optimized for bulk throughput, not latency
- ▶Scales to exabytes of data
Why It Matters
- ▶Reduces storage costs to cents per terabyte
- ▶Provides durability suitable for long-term data retention
- ▶Enables separation of storage from compute infrastructure
- ▶Scales to massive data volumes without performance degradation
- ▶Supports multiple compute engines querying same data
- ▶Provides global data distribution for compliance and performance
Example
A video platform stores all video content in S3: raw uploads are written to one bucket (immutable), processed versions (transcoded, lower bitrate) in another, analytics data (view counts, quality metrics) in a third. CloudFront CDN caches frequently accessed videos, Spark jobs analyze view patterns in S3 data directly (no copying), backup copies are replicated to another region automatically. Total cost is dramatically lower than traditional storage because S3 costs less than a penny per gigabyte per month.
Coginiti Perspective
Coginiti has first-class object storage integration. The object store browser lets teams manage files on S3, Azure Blob, and GCS directly within the platform. CoginitiScript publishes results as Parquet or CSV to object storage via configured connections, and users can query data files on object storage without loading them into a warehouse first. This supports ELT patterns where raw data lands on object storage and is transformed in place or selectively loaded into analytical platforms.
Related Concepts
More in Data Storage & Compute
Cloud Data Warehouse
Cloud Data Warehouse is a managed analytics database service hosted in cloud infrastructure, providing elastic scaling, separated compute and storage, and usage-based pricing.
Columnar Storage
Columnar Storage is a data storage format that organizes data by column rather than by row, enabling efficient compression and fast analytical queries that access subsets of columns.
Compute Warehouse (e.g., Snowflake Virtual Warehouse)
Compute Warehouse is an elastic compute resource in a cloud data warehouse that allocates processing power for query execution, scaling up and down based on workload demands.
Data Caching
Data Caching is the storage of frequently accessed data in fast, temporary memory to reduce latency and computational cost by serving requests from cache rather than recomputing or refetching.
Data Lake
Data Lake is a large-scale storage system that retains data in its raw, original format from multiple sources, serving as a central repository for historical data and enabling diverse analytics and data science use cases.
Data Lakehouse
Data Lakehouse is an architecture that combines data lake storage advantages (cheap, flexible, scalable) with data warehouse query capabilities (schema, performance, governance).
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.