Record Linkage and Data Integration
With increasing availability of data and advances in technology, demand has grown for an efficient linkage application to combine data across multiple sources. Westat developed a tool designed to compare data records and identify those that belong to the same entity. An entity may be a person, a point address, a business, or any unit of analysis. The data for identification (e.g., name, date of birth, gender, and address) are ostensibly the same for the same entity. The challenge analysts face is that over time and across data sources, changes or inadvertent errors can occur, resulting in some differences in the information.
To address this challenge, our linkage tool can be used to deduplicate single files or match multiple files, based on a well-known probabilistic linkage method developed by Fellegi and Sunter. It uses modern algorithms to improve the efficiency of handling big data, to resolve the difficulties of transitive linkage, to match numeric data with some tolerance, and to compare text strings using fuzzy methods.