Glossary/Data Governance & Quality

Data Testing

Data testing is the systematic verification of data quality, transformation correctness, and business logic through automated tests that ensure data meets specifications.

Data testing applies software testing principles to data. It includes unit tests (does this transformation work correctly on sample data?), integration tests (do upstream and downstream tables align?), and acceptance tests (does the data meet business requirements?). Tests can be rule-based (validate against conditions), statistical (ensure distributions are normal), or comparative (compare results to previous versions). Data testing catches regressions: when code changes break transformations in subtle ways.

Data testing emerged because data errors are common and expensive, yet data systems were built without testing rigor that software systems receive. A miscalculated metric in production might influence thousands of business decisions before anyone notices. Testing catches errors early: before they reach production, analysts discover them during analysis, or automated tests flag them continuously.

Data testing includes multiple levels: row-level tests (ensure individual rows are valid), column tests (distributions look normal?), table tests (counts match expectations?), and end-to-end tests (does data flow from source to consumption correctly?). Tests can be deterministic (specific conditions must be true) or probabilistic (anomaly detection flags unusual patterns). Modern data testing frameworks like dbt include testing libraries; specialized tools like Great Expectations focus on data testing.

Key Characteristics

  • Automated verification of data correctness
  • Includes unit, integration, and acceptance tests
  • Tests transformations, business logic, and quality
  • Provides early detection of regressions
  • Integrates with CI/CD pipelines
  • Tracks test results and trends

Why It Matters

  • Reliability: Automated tests catch errors before production
  • Regression Detection: Alerts when changes break existing functionality
  • Confidence: Test coverage builds confidence in transformations
  • Speed: Automated testing replaces manual verification
  • Compliance: Demonstrates data transformation correctness

Example

Data tests for revenue metric: (1) test that revenue is always non-negative, (2) test that sum of item revenues equals order revenue (no rounding errors), (3) test that revenue for today is not zero (flagging unusual drop), (4) test that new orders appear within 2 hours. All run automatically after each ETL run.

Coginiti Perspective

CoginitiScript has built-in data testing through #+test blocks. A test block returns a result set: empty means pass, non-empty means fail. Tests can be run individually or programmatically via std/test with test.Run(), and the onFailure parameter controls behavior (test.Stop halts execution, test.Continue logs and proceeds). Tests run against real data on any of the 24+ connected platforms, so validation reflects actual production conditions rather than synthetic test datasets.

Related Concepts

Data ValidationData QualityData ObservabilitySchema ValidationContinuous IntegrationTesting FrameworkRegression TestingQuality Assurance

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.