Data Testing
Data testing is the systematic verification of data quality, transformation correctness, and business logic through automated tests that ensure data meets specifications.
Data testing applies software testing principles to data. It includes unit tests (does this transformation work correctly on sample data?), integration tests (do upstream and downstream tables align?), and acceptance tests (does the data meet business requirements?). Tests can be rule-based (validate against conditions), statistical (ensure distributions are normal), or comparative (compare results to previous versions). Data testing catches regressions: when code changes break transformations in subtle ways.
Data testing emerged because data errors are common and expensive, yet data systems were built without testing rigor that software systems receive. A miscalculated metric in production might influence thousands of business decisions before anyone notices. Testing catches errors early: before they reach production, analysts discover them during analysis, or automated tests flag them continuously.
Data testing includes multiple levels: row-level tests (ensure individual rows are valid), column tests (distributions look normal?), table tests (counts match expectations?), and end-to-end tests (does data flow from source to consumption correctly?). Tests can be deterministic (specific conditions must be true) or probabilistic (anomaly detection flags unusual patterns). Modern data testing frameworks like dbt include testing libraries; specialized tools like Great Expectations focus on data testing.
Key Characteristics
- ▶Automated verification of data correctness
- ▶Includes unit, integration, and acceptance tests
- ▶Tests transformations, business logic, and quality
- ▶Provides early detection of regressions
- ▶Integrates with CI/CD pipelines
- ▶Tracks test results and trends
Why It Matters
- ▶Reliability: Automated tests catch errors before production
- ▶Regression Detection: Alerts when changes break existing functionality
- ▶Confidence: Test coverage builds confidence in transformations
- ▶Speed: Automated testing replaces manual verification
- ▶Compliance: Demonstrates data transformation correctness
Example
Data tests for revenue metric: (1) test that revenue is always non-negative, (2) test that sum of item revenues equals order revenue (no rounding errors), (3) test that revenue for today is not zero (flagging unusual drop), (4) test that new orders appear within 2 hours. All run automatically after each ETL run.
Coginiti Perspective
CoginitiScript has built-in data testing through #+test blocks. A test block returns a result set: empty means pass, non-empty means fail. Tests can be run individually or programmatically via std/test with test.Run(), and the onFailure parameter controls behavior (test.Stop halts execution, test.Continue logs and proceeds). Tests run against real data on any of the 24+ connected platforms, so validation reflects actual production conditions rather than synthetic test datasets.
Related Concepts
More in Data Governance & Quality
Analytics Catalog
An analytics catalog is a specialized data catalog focused on analytics assets such as metrics, dimensions, dashboards, and saved queries, enabling discovery and governance of analytics-specific objects.
Business Metadata
Business metadata is contextual information that gives data meaning to business users, including definitions, descriptions, ownership, and guidance on appropriate use.
Data Catalog
A data catalog is a searchable repository of metadata about data assets that helps users discover available datasets, understand their content, and assess their quality and suitability for use.
Data Certification
Data certification is a formal process of validating and approving data quality, documenting that data meets governance standards and is safe for use in critical business decisions.
Data Contracts
A data contract is a formal agreement specifying the expectations between data producers and consumers, including schema, quality guarantees, freshness SLAs, and remediation obligations.
Data Governance
Data governance is a framework of policies, processes, and controls that define how data is managed, who is responsible for it, and how it should be used to ensure quality, security, and compliance.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.