Pei Li will give a talk on "Linking Records with Value Diversity".
In real-world, we often observe value diversity in entities. Many data
sets contain temporal records over a long period of time; values of the
same real-world entities can change over time (e.g., author information
in DBLP). In such cases, we often wish to identify records that describe
the same entity over time, and so be able to enable interesting
longitudinal data analysis. Many data sets contain records from
different groups (e.g., business listings of the same chain); records of
the same group can have local values (e.g., local phone numbers of the
same business chain). In such cases, we wish to link records that belong
to the same group. However, most existing record linkage techniques
assume that records describing the same real-world entities are fairly
consistent and often focus on different representations of the same
value; thus can fall short for records with value diversity.
This talk addresses the two sub-problems of linking records with value
diversity: temporal and group linkage. For temporal record linkage, we
will present two key components for leveraging temporal information in
linkage: (1) time decay that captures the effect of elapsed time on
entity value evolution, and (2) temporal clustering that, instead of
comparing each pair of records locally, consider time order of the
records and make global decisions. We show experimental results on
real-world data sets, followed by a demonstration for interesting
longitudinal data analysis on DBLP data set. For group linkage, we
propose a two-stage linkage approach that leverage strong and weak
evidence in data set, and show efficiency and effectiveness of our
techniques on a real-world data set of 18 million records.
The meeting takes place at 1400h in BIN 1.D.07. The talk is held in English.