DBTG Meeting: 16.10.2012

Pei Li will give a talk on "Linking Records with Value Diversity".

In real-world, we often observe value diversity in entities. Many data sets contain temporal records over a long period of time; values of the same real-world entities can change over time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time, and so be able to enable interesting longitudinal data analysis. Many data sets contain records from different groups (e.g., business listings of the same chain); records of the same group can have local values (e.g., local phone numbers of the same business chain). In such cases, we wish to link records that belong to the same group. However, most existing record linkage techniques assume that records describing the same real-world entities are fairly consistent and often focus on different representations of the same value; thus can fall short for records with value diversity. This talk addresses the two sub-problems of linking records with value diversity: temporal and group linkage. For temporal record linkage, we will present two key components for leveraging temporal information in linkage: (1) time decay that captures the effect of elapsed time on entity value evolution, and (2) temporal clustering that, instead of comparing each pair of records locally, consider time order of the records and make global decisions. We show experimental results on real-world data sets, followed by a demonstration for interesting longitudinal data analysis on DBLP data set. For group linkage, we propose a two-stage linkage approach that leverage strong and weak evidence in data set, and show efficiency and effectiveness of our techniques on a real-world data set of 18 million records.
The meeting takes place at 1400h in BIN 1.D.07. The talk is held in English.