Probabilistic Record-Linkage to Measure Re-Identification Risk Case Studies
Identity disclosure can occur if respondents in the survey data could be matched to records in administrative data or other external data sources through common variables. Our standard process measures the re-identification risk in the statistical disclosure control process before disseminating the data to the public. Probabilistic record linkage is an effective way to identify the re-identification risk in a potential public use file.
Federal Employee Viewpoint Survey (FEVS)
In FEVS, conducted for the Office of Personnel Management, Westat used probability-based matching to estimate the likelihood of correctly matching a person in a proposed partially synthetic public-use file to the record in the original data. The file-level risk was computed as the average of the probabilities and used to determine if the partially synthetic file was safe for release.
National Household Food Acquisition and Purchase Study
For the National Household Food Acquisition and Purchase Study for the U.S. Department of Agriculture, we evaluated the re-identification risk due to geographical clustering.
The public-use file included some stratification information that could potentially be used to find records in the same U.S. counties. A probabilistic record-linkage software was used to identify the counties that were subject to high re-identification risk.