Glossary/Core Data Architecture

Data Transformation

Data Transformation is the process of converting raw data from source systems into cleaned, standardized, and analysis-ready formats that align with business definitions and requirements.

Data transformation applies business logic and technical rules to raw data: filtering irrelevant records, standardizing formats (converting dates from MM/DD/YYYY to YYYY-MM-DD), enriching with calculations (computing customer lifetime value), and creating dimensions aligned with business metrics. Transformations bridge the gap between how source systems store data (optimized for transactions) and how analysts need data (organized by business concepts).

Modern data transformation has shifted from proprietary ETL tools toward SQL-based transformations (dbt, SQL stored procedures) that analytics engineers and analysts can understand and maintain. This democratization reduces dependency on specialized engineers and enables teams to respond quickly to changing business requirements. Transformations may occur in multiple layers: in-pipeline (bulk operations), in the warehouse (dimensional modeling), or at query time (computed columns).

In practice, a single transformation might: filter out test orders, standardize currency to USD, deduplicate records by combining purchase and refund events, and calculate metrics like customer tenure and order frequency. Transformations are often the longest phase of data pipelines because they require careful testing to ensure business logic is correctly applied.

Key Characteristics

  • Applies business logic and standardization rules to raw data
  • Converts source formats to analytics-ready schemas
  • Creates reusable, maintainable code (SQL or Python)
  • Includes data quality checks and validations
  • Supports incremental and full-refresh patterns
  • Often organized in layers: staging, intermediate, and mart tables

Why It Matters

  • Ensures consistent definitions of business metrics across teams
  • Improves data quality by standardizing formats and validating values
  • Reduces analysis time by providing pre-computed, analysis-ready tables
  • Enables self-service analytics by making data intuitive for business users
  • Reduces query costs by computing and storing results versus querying raw data
  • Enables compliance by normalizing data to match regulatory requirements

Example

An e-commerce transformation: raw purchases have timestamps in various formats, currency codes vary by source, customer IDs duplicate across systems. Transformation standardizes timestamps to UTC, converts prices to USD at historical rates, deduplicates customers by matching name, email, and domain, calculates order value after tax and refunds, and creates a customer dimension with calculated fields like lifetime value and days-since-last-purchase. dbt tests ensure all orders have valid customer IDs and positive amounts.

Coginiti Perspective

CoginitiScript enables modular, reusable transformations with templating, macros, conditionals, and loops. Unlike transformations embedded in pipeline code or BI tool logic, CoginitiScript blocks are stored in the analytics catalog, versioned, reviewed, and shared. When a transformation definition is updated, dependent analytics automatically inherit the change, eliminating the manual propagation that causes transformation drift across teams.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.