Tools for using commercial sources of residential histories for cancer research
How to use NCI’s SAS residential history generation programs
As part of the National Cancer Institute’s residential history pilot project, Westat created “ResHistGen,” a set of open-source SAS programs that will help researchers and others reconcile data from commercial vendors and generate residential histories of study participants.
For more information on the residential history study and the development of the SAS programs, see NCI/SEER Residential History Project [1544kb PDF], the study’s technical report.
The steps to use the ResHistGen programs for the creation of residential histories of research subjects can be performed by staff at the cancer registry, members of the research team, or staff at a third-party contractor. To access the programs along with information on how to use them, please go to the GitHub repository.
- Individual patient identifiers are needed for this process. It is essential that the researcher follow established procedures to protect the privacy of human subjects.
- Submit subject names and identifiers for relevant cases to the vendor.
- Geocode the addresses received from the commercial vendor. All U.S. cancer registries have access to the North American Association of Central Cancer Registries (NAACCR) geocoder, but any batch geocoder can be used.
- Run the first SAS program (01_MatchAddresses1.sas [12kb text file]) to match common addresses. For a study with a small number of study subjects, possible matches can be reviewed manually in a two-step process. For a study with a large number of subjects, this can be done automatically in a single step.
- If a manual review is desired, edit the “LN_matchcombos_review.xlsx” created by the first program by deleting rows that are not matches. This review can be guided by the NCI SEER Manual Address Comparison Guidelines [31kb Word file]
- Run the second SAS program (02_MatchAddresses2.sas [16kb text file]) to add any results from the manual review and combine matched addresses.
- Run the third SAS program (03_BuildResHistory.sas [13kb text file]) to reconcile addresses and generate a derived residential history.
The current release of these programs is Version 2.1. For a summary of changes since the previous release, see Version 2.1 Changes.txt [3kb text file].
In the ResHistGen programs, local file locations are specified in the first few lines of each program to facilitate portability. The programs have been written to avoid any data conversion or divide-by-zero warning messages; if these occur, there is an error. There are tests for unexpected conditions, and messages are generated with three asterisks (“***”) if any unexpected conditions are encountered.
The ResHistGen programs are released under the GNU General Public License [34kb text file]. For questions, limited support is available by email at NCI.ResidentialHistory@westat.com; enhancements may also be shared via this email address and if found to be beneficial, they will be included in a future release. By the terms of the license, you may distribute your changes on your own provided you include a prominent notice that you have modified the original.
If you publish results based on these programs, please include the following citation: ResHistGen Residential History Generation Programs, Version 2.1 – October 2020; Surveillance Research Program, National Cancer Institute.