We study the problem of group linkage: linking records that refer to multiple entities in the same group. Applications for group linkage include finding businesses in the same chain, finding social network users from the same organization, and so on. Group linkage faces new challenges compared to traditional entity resolution. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, we need to be able to distinguish local values from erroneous values.
We present a robust two-stage algorithm: the first stage identifies pivots–maximal sets of records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the pivots and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. Experimental results show the high effectiveness and efficiency of our algorithm on various real-world data sets.