Glossary/Semantic Layer & Metrics

Data Semantics

Data semantics refers to the documented meaning, business context, and valid usage of data elements, including definitions, relationships, constraints, and governance rules.

Data semantics answers the question: what does this data mean and how should it be used? A column labeled "status" has no semantic meaning without context: does it mean order status (pending, shipped, delivered) or account status (active, suspended, closed)? Data semantics provides that context. It includes: business definitions (what the data represents), valid values, relationships to other data elements, constraints (what combinations are invalid), and usage guidance (which analyses are appropriate).

Data semantics emerged because raw schemas and data dictionaries lack business context. A developer sees a string column "status"; a business user needs to know it represents subscription state. Bridging this gap requires explicit semantic documentation. In modern analytics, data semantics is embedded into semantic layers, metadata platforms, and knowledge graphs so it's discoverable and enforceable rather than hidden in documentation.

Data semantics includes formal elements (join relationships, cardinality, aggregation rules) and informal elements (natural language descriptions, caveats, appropriate use cases). It also captures constraints: if a customer has a given status, what other attributes must be true? If an order is shipped, how long should delivery take? The richer the data semantics, the more intelligent the system can be about data quality validation, impact analysis, and anomaly detection.

Key Characteristics

  • Defines business meaning for data elements
  • Specifies valid values, ranges, and combinations
  • Documents relationships and dependencies
  • Includes usage guidance and constraints
  • Tracks lineage and data ownership
  • Discoverable in catalogs or semantic layers

Why It Matters

  • Accuracy: Users understand what data represents, reducing misinterpretation
  • Quality: Constraints enable validation that data conforms to business rules
  • Governance: Clear semantics facilitate access control and compliance
  • Efficiency: Analysts don't repeat exploration; semantics are documented
  • Integration: Systems can reason about compatibility across data domains

Example

The column "customer_id" has semantics: it's a unique identifier for customers, valid only for customer records (not orders), must be non-null and positive integer, and joins to the customers table on customers.id. The same column labeled differently in another table requires semantic alignment to join correctly.

Coginiti Perspective

SMDL captures data semantics formally in .smdl files. Each dimension carries a type (text, number, date, datetime, bool) that constrains valid operations, while measures specify aggregation semantics (sum, avg, count_distinct, and 9 others) so the engine enforces correct calculations. Relationship definitions encode the semantic connection between entities with explicit cardinality. The #+meta block in CoginitiScript adds documentation, versioning, and authoring metadata to transformation logic, preserving semantic context alongside the code.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.