Tools for using commercial sources of residential histories for cancer research
Information about patients’ residential history is important for cancer research but is difficult and expensive to obtain. However, residential history data are becoming increasingly available from commercial vendors. The National Cancer Institute (NCI) contracted with Westat (contract #HHSN261201500371P) to examine the feasibility of using commercial sources of residential data to construct individual residential histories. The goal was to develop standardized processes for creating residential histories that can be linked to cancer surveillance data.
For this study, we
- Identified three commercial vendors of residential address data
- Collected a set of residential histories from volunteer participants at NCI and the National Institute of Environmental Health Sciences to assess the accuracy of the commercially provided data
- Developed an algorithm for deriving residential histories from the vendor data
- Developed methods to compare the accuracy of these derived residential histories with the survey-reported residential histories
- Compared the results to a residential history based on the assumption that people have always lived at their current address (this type of history is used in many health studies when no residential history is available)
What We Learned About Commercially Available Residential Data
- Of the three vendors we identified, LexisNexis® (Vendor 1 in the study’s technical report) had the most complete and accurate residential history data.
- Commercial vendors generally have data on previous residences going back to 1985.
- Data on deceased individuals are available, which is important for studies of highly fatal cancers.
- Only U.S. and military APO addresses are included.
- The data consist of a set of addresses associated with each individual rather than a residential history for the individual per se (i.e., it does not specify that a person lived at location A from time 1 to time 2, location B from time 2 to time 3, etc.). The timeframes in the vendor data were frequently missing or incorrect.
- Vendor data often included multiple addresses for a person for the same time period.
- The data included many addresses not part of survey-reported residential histories, including work addresses, family members’ addresses, and more.
We created a set of helpful open-source SAS programs for researchers to use to reconcile residential data gathered by commercial vendors and create residential histories. For information on how to access and use those programs, see How to use NCI’s SAS residential history generation programs.
For more information on the study and the SAS programs, see NCI/SEER Residential History Project (1544KB PDF), the study’s technical report.