Glossary/Knowledge Representation

Entity Resolution

Entity Resolution is the process of identifying and matching records that represent the same real-world entity across databases, data sources, or versions, enabling unified views and accurate analytics.

When organizations integrate data from multiple sources, they face a fundamental challenge: the same entity appears in different systems with different identifiers and variations. "John Smith" in the CRM might be "J. Smith" in accounting and "John Edward Smith" in HR. These are likely the same person, but systems treat them as different entities. Entity resolution (also called entity matching, record linkage, or deduplication) automatically identifies these matches using techniques like string similarity, machine learning, and domain rules.

Entity resolution is critical for analytics accuracy. Without resolution, metrics are distorted: customer lifetime value is split across duplicate records, churn rates are understated (customers appear to leave but are just duplicates), and customer segmentation is unreliable. High-quality entity resolution enables unified customer views, accurate metrics, and reliable analytics. The challenge is balancing precision (avoiding false matches) and recall (finding all true matches).

Modern entity resolution combines multiple techniques: string matching for similar names, machine learning models trained on labeled examples, knowledge graphs capturing entity relationships, and manual review for ambiguous cases. The field is evolving rapidly with improvements in AI: language models can understand context, reducing false matches that simpler algorithms would make.

Key Characteristics

  • Identifies and matches records representing the same entity across sources
  • Uses techniques including string similarity, machine learning, and domain rules
  • Handles variations in naming, formatting, and identifier schemes
  • Balances precision (correct matches only) and recall (finding all true matches)
  • Produces unified entity identifiers enabling single-customer-view analytics
  • Often includes manual review workflows for ambiguous or high-value matches

Why It Matters

  • Enables accurate analytics by eliminating duplicate entities distorting metrics
  • Supports unified customer views essential for customer analytics and personalization
  • Improves data quality by surfacing inconsistencies in entity representation
  • Reduces customer service issues where systems treat the same customer differently
  • Enables compliance by establishing canonical customer identities
  • Facilitates fraud detection by connecting entities that appear disconnected

Example

An e-commerce company integrates web sales (customer emails), in-store transactions (names), and customer service (phone numbers). Entity resolution matches these sources: web customer "john.smith@email.com" matches in-store customer "John Smith" (using email/phone/address matching) and service customer "+1-555-123-4567" (using phone number). Result: a unified customer view instead of three separate records.

Coginiti Perspective

Coginiti supports entity resolution workflows through CoginitiScript and testing frameworks, enabling organizations to build and validate entity matching logic that feeds into semantic models. By formalizing entity resolution in code with test coverage, teams ensure that analytics consume deduplicated, canonical entities, producing reliable metrics and single-view analytics across integrated data sources.

Related Concepts

EntityMaster Data ManagementData QualityDeduplicationMatchingKnowledge Graph

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.