Protecting the privacy of people who respond to surveys and censuses is a critical responsibility in federal data collection. During the 2010 Decennial Census, the U.S. Census Bureau applied several safeguards to protect individual information. Later reviews, however, revealed that published tabular data could still be vulnerable to certain types of attacks, such as database reconstruction. In response, the Census Bureau adopted a new approach for the 2020 Census, using a formal, mathematically defined privacy framework that provides measurable guarantees about the level of protection applied to released statistics. Here to discuss lessons learned from the 2010 Census, challenges faced today, and Westat’s innovations and leadership in protecting confidentiality is Tom Krenzke, MS, a Vice President for Statistics and Data Science.
Q. What lessons should data producers and clients take from the Census Bureau’s 2010 experience regarding safety vulnerabilities and its development of new protections in 2020, and how has Westat applied those lessons in its own work?
A. Some people mistakenly believe that tabular data are always safe to share or use because tables only report aggregated information. However, if a large number of data tables are produced from a fixed set of records, it becomes possible for individuals to reconstruct much of the underlying database, leading to breaches of confidentiality. So, we’ve learned from the Bureau’s evaluation of the 2010 Census to counter such risks through enhanced privacy protections. One main strategy to consider is “noise infusion,” which obscures sensitive information to protect data privacy while maintaining analytical integrity. We add noise to data through techniques such as data perturbation or synthetic data generation. Noise infusion approaches have been developed, evaluated, and applied to tabular results (or estimates) as well.
Q. Because clients often worry that protecting confidentiality will erode the usefulness of the data, how does Westat strike the balance between reducing disclosure risk and maintaining analytical value?
A. To strike this balance, we have developed approaches that specifically target cases with higher disclosure risks. For example, we classify data values according to their risk of leading to re-identification, ranging from high to low. Values with higher risk are given a greater chance of being synthesized, while lower risk values are more likely to be retained. Based on these probabilities, we then select a sample of data values and generate the synthetic database selectively. This selective approach reduces disclosure risk while preserving much of the analytical value of the original data.
Q. What tools has Westat built to address confidentiality threats, and how do these tools work together to address different confidentiality threats?
A. Westat has developed a suite of tools and processes to quantify and mitigate disclosure risks. To quantify re-identification risk, we use probabilistic record linkage software to match sample files to population files. For situations where population files are not available, we developed the sdcnway R package, publicly available on CRAN, to estimate risks using log-linear modeling and exhaustive tabulation.
Once the disclosure risks are identified, they can be mitigated through noise infusion. For example, the model-assisted constrained hot deck has been used to generate synthetic data, supporting the development of large-scale planning resources that included tables with over a billion cells for use by the transportation community. Westat has also developed other tools, such as the DataSwap software (publicly available on GitHub – Westat-Stats/DataSwap: Macro), to perform controlled random swapping. Together, these tools provide a comprehensive framework for safeguarding confidentiality while maintaining analytical value.
Q. With data attacks growing more sophisticated—table differencing and reconstruction among them—what new approaches is Westat advancing to counter these risks?
A. We developed noise infusion to obscure sensitive information while maintaining analytical integrity. To quantify re-identification risk, we use probability record linkage software to match sample files with population files, or we estimate risk using a mathematical model when population files are not available. These tools also help pinpoint high-risk values in datasets. We also continue discussions on data privacy protections in WesLyticsTM, a company-wide platform for data management, covering acquisition, processing, engineering, analytics, and dissemination. As part of this ecosystem, we can ensure that clients can have confidence in how we protect their data and the data of individuals. The unified data governance tool within the platform allows for highly granular data protections on the row and column levels, including column masking capabilities.
Our innovative tools, deep statistical expertise, and proven track record ensure that we remain on the cutting-edge in this field as a trusted partner for clients navigating this evolving landscape.
Tom Krenzke, MS, Vice President, Statistics and Data Science
Q. After applying privacy-preserving methods, how does Westat ensure that the data still support valid, trustworthy inferences? What kinds of evaluation techniques give clients confidence in their data releases?
A. We use several methods to make sure that protected data are still useful for analysis. For example, we compare original and synthetic datasets to evaluate the similarity of the results and check whether the overall patterns and relationships are preserved. By evaluating the data with a variety of metrics and examining them as holistically as possible, we provide rigorous evidence that the released datasets maintain validity and support trustworthy inferences, giving clients confidence in their data releases.
Q. With AI now generating synthetic health records, what opportunities and challenges do you foresee for privacy-preserving data science, and how does Westat continue to position itself as a trusted partner for clients in this space?
A. Certainly, AI has expanded our ability to create synthetic data that supports testing, simulation, and research without compromising individual privacy. At the same time, AI introduces new challenges for responsible data sharing. Such challenges include AI’s extensive reliance on data, which may sometimes be used in unexpected ways. This underscores the importance of effective data governance and privacy protection.
On the opportunity side, AI-driven models can enable richer data access for researchers while safeguarding personal information. Our innovative tools, deep statistical expertise, and proven track record ensure that we remain on the cutting-edge in this field as a trusted partner for clients navigating this evolving landscape.
Q. How are the privacy preservation strategies impacting everyday Americans?
A. These strategies are designed to protect individuals’ information and build public trust. Knowing that safeguards are in place gives Americans confidence that their privacy will be protected, which in turn encourages participation in government surveys and studies. Ultimately, these protections strengthen both the quality of the data collected and the public’s trust in the institutions gathering it, helping clients make more informed decisions that benefit the nation as a whole.
Capabilities
Advanced Technologies Analysis and Modeling Data Integration, Harmonization, and Complex Analytics Data Privacy Data Science Data Science and Analytics Data Science Infrastructure Infrastructure and Security Machine Learning and Artificial Intelligence Natural Language Processing and Text Analytics Statistical MethodsTopics
WesLyticsFeatured Expert
Tom Krenzke
Vice President
-
Expert Interview
Safeguarding Data Privacy: Innovations at WestatOctober 2025
Protecting the privacy of people who respond to surveys and censuses is a critical responsibility in federal data collection. During the 2010 Decennial Census, the…
-
Expert Interview
Novel Strategies Transform Clinical Trials Into Future OpportunitiesOctober 2025
For the rigorous evaluation of novel medicines, clinical trials are essential to ensure safety and efficacy. However, clinical trials are experiencing turbulent times due to…
-
Expert Interview
Real-World Insights: Nirsevimab’s Protection Against RSV in InfantsSeptember 2025
Each year in the U.S., respiratory syncytial virus, commonly known as RSV, leads to approximately 58,000 to 80,000 hospitalizations among children under age 5. The…