Data Quality
Data quality is the degree to which data is accurate, complete, timely, and conforms to business requirements, enabling confident use for decision-making and analysis.
Data quality encompasses multiple dimensions: accuracy (does the data represent reality?), completeness (are there unexpected nulls or missing values?), timeliness (is the data current or stale?), consistency (does the data align with business rules?), and validity (does it conform to specified formats and ranges?). Poor quality in any dimension undermines trust. High-quality data enables confident decision-making; low-quality data leads to incorrect conclusions.
Data quality deteriorates through multiple mechanisms: source system bugs (producing incorrect data), pipeline failures (losing or corrupting data), schema changes (breaking downstream expectations), or business rule violations (data that shouldn't exist does). Without active quality management, quality gradually degrades as systems age. Organizations often discover poor quality only after bad decisions are made based on flawed data.
Data quality management involves defining quality standards, measuring quality, detecting issues, and remediating root causes. Standards vary by use case: financial data requires high accuracy; marketing attribution might tolerate 10% error. Quality is measured through metrics: null rates, duplicate rates, schema conformance. Detection uses validation rules and anomaly detection. Remediation involves fixing data or adjusting processes to prevent future issues.
Key Characteristics
- ▶Multidimensional: accuracy, completeness, timeliness, consistency
- ▶Measured through quality metrics and tests
- ▶Managed through validation rules and monitoring
- ▶Tied to business requirements and use cases
- ▶Requires root cause analysis and remediation
- ▶Continuous process, not one-time effort
Why It Matters
- ▶Confidence: High-quality data enables confident decisions
- ▶Efficiency: Reduces time spent investigating and fixing bad data
- ▶Compliance: Many regulations require demonstrating data quality
- ▶Trust: Quality is foundational to analytics adoption
- ▶Cost: Poor quality leads to wrong decisions and wasted resources
Example
A customer table should have: (1) unique customer IDs (no duplicates), (2) required fields like email (no nulls), (3) consistent phone number format, (4) valid registration dates (not in future). Data quality tests validate these. If tests fail, data stewards investigate and fix root causes.
Coginiti Perspective
Coginiti addresses data quality at multiple levels. CoginitiScript #+test blocks define quality assertions that run within pipelines, returning pass/fail based on whether results are empty (pass) or contain rows (fail), with onFailure options to stop or continue execution. SMDL enforces semantic quality by typing dimensions and declaring measure aggregation rules, preventing misuse at query time. The Analytics Catalog's promotion workflow ensures that only reviewed and tested logic produces the data that downstream consumers rely on.
Related Concepts
More in Data Governance & Quality
Analytics Catalog
An analytics catalog is a specialized data catalog focused on analytics assets such as metrics, dimensions, dashboards, and saved queries, enabling discovery and governance of analytics-specific objects.
Business Metadata
Business metadata is contextual information that gives data meaning to business users, including definitions, descriptions, ownership, and guidance on appropriate use.
Data Catalog
A data catalog is a searchable repository of metadata about data assets that helps users discover available datasets, understand their content, and assess their quality and suitability for use.
Data Certification
Data certification is a formal process of validating and approving data quality, documenting that data meets governance standards and is safe for use in critical business decisions.
Data Contracts
A data contract is a formal agreement specifying the expectations between data producers and consumers, including schema, quality guarantees, freshness SLAs, and remediation obligations.
Data Governance
Data governance is a framework of policies, processes, and controls that define how data is managed, who is responsible for it, and how it should be used to ensure quality, security, and compliance.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.