The Joint Statistical Meetings (JSM), to be held virtually August 8-12, 2021, is one of the largest statistical events in the world. More than 13 professional societies participate, and the conference brings together 6,500+ attendees from 52 countries, 80+ exhibitors, 1,000+ student attendees, and 75+ employers hiring for 200+ positions. Whether virtual or in person, Westat continues to deliver leading-edge research for the event.
“Each year, JSM provides an excellent opportunity for all to come together and network with our colleagues from across the nation and around the globe,” notes Westat Vice President Jeri Mulrow, Statistics and Evaluation Sciences Director. “With the unprecedented challenges of 2020 behind us, Westat’s statisticians and data scientists are sharing innovative ideas and methods that have helped us push the field forward, address client research needs, and enhance statistical research for all.”
Learn more how we can help you meet your project challenges. Check out Experts Guiding the Way (PDF), which illustrates our multimode capabilities and innovative ways to advance the data collection science.
This year, in the spirit of the event’s theme of Statistics, Data, and the Stories They Tell, we share a selection of our statistics and data stories below. (Note: * indicates presenter.)
Sunday, August 8, 2021: 3:30-5:20 pm (ET): An Approach to Estimate the Re-Identification Risk in Longitudinal Survey Microdata: Jianzhu Li*, Lin Li, Tom Krenzke
Protecting survey respondents’ confidential data is paramount. Before releasing statistical data products, a risk assessment needs to take place to ensure that the disclosure risk is at an acceptably low level. In longitudinal surveys, because the same respondents participate in more than one wave of a survey, the re-identification risk is usually higher than the risk in cross-sectional data. Common variables that do not change over time or change in patterns may allow the users to link up the records in individual files to form longitudinal records. Here, we share a survey example to demonstrate using the log-linear modeling approach to measure the re-identification risk while incorporating the longitudinal nature of the data, which measures the increase of longitudinal risk relative to the cross-sectional risk.
Tuesday, August 10, 2021: 10-11:50 am (ET): Leveraging Administrative Data to Improve Child Passenger Safety: Elizabeth Petraglia*
Administrative, or “found,” data (tax records, sensor data, transaction receipts) have become common resources for research. But alternative data sources can help researchers make more detailed implications often while improving statistical efficiency and decreasing cost. We introduce, for example, the National Digital Car Seat Check Form (NDCF), which collects detailed and user-friendly administrative data on child passenger safety (CPS) in the course of a typical car seat check. Here we explore the data offerings of NDCF and some of the innovative methods, such as visualization-based and personalized dashboards, that are used to distribute NDCF data to users with varying levels of data literacy. We examine how the NDCF compares to selected CPS-related surveys and observational studies, assessing the strengths and limitations of each source in terms of coverage, detail, data quality, and sample size. We wind up this story by providing recommendations for practical applications in the field, including exploratory work on combining survey-based data with the NDCF.
Wednesday, August 11, 2021: 1:30-3:20 pm (ET): Creating Base Weights and Replicate Weights for a PPS Sample with a Supplemental Sample When the Eligibility Frame Information Is Available After Sampling: Jianru Chen*, Ismael Flores Cervantes, Mike Kwanisai
Supplemental samples are used in surveys to increase sample sizes when there is a low response rate or high ineligibility. There are no straightforward methods for drawing supplemental samples for a systematic probability proportional to size (PPS) sample design after the main sample had been selected. We examine a situation where there was a large number of ineligible in the sampling frame known after the main sample selection but before drawing the supplemental sample. A non-overlapping supplemental sample was drawn by randomly offsetting the random start of the main sample interval. We explore and evaluate several methods for creating the base and replicate weights that properly reflect the variance estimates for this design. Finally, we compare the empirical bias and variance for these methods using Monte Carlo simulations.
Wednesday, August 11, 2021: 1:30-3:20 pm (ET): A Comparison of Two CHAID Packages for Modeling Survey Nonresponse: Tien-Huan Lin*, Carlos Arieira, Ismael Flores Cervantes, Mike Kwanisai
When it comes to unit nonresponse, it is common practice to lessen bias by modeling response propensity and adjusting weights to account for different response propensities. The CHAID (Chi-square Automatic Interactive Detector) algorithm is commonly used to produce weighting classes for this purpose, which brings us to the analysis of 2 popular software packages that implement the CHAID algorithm: SI-CHAID and HPSPLIT. We will describe the pros and cons of the 2 packages in terms of the resulting bias and variance of the weighted estimates by using simulations of a complex survey sample design to examine the packages’ interchangeability.
Wednesday, August 11, 2021: 1:30-3:20 pm (ET): Evaluation of Methods to Form Segments from Census Blocks in Area Sample Designs: Jennifer Kali*, Tom Krenzke, Ying Chen, Jianru Chen, Jim Green
In-person surveys often use a multistage sample design in which households are sampled within geographic areas called segments, improving cost efficiency by restricting the geographic range that data collectors travel. Often, segments are formed by grouping neighboring census blocks. A simple method to combine adjacent census blocks is to sort the census block file by the census block ID, which often creates segments that are not contiguous, not complete (contain holes), and not compact. Issues with contiguity and completeness create challenges for data collectors in determining which housing units to include in the sample frame. Less compact segments increase interviewer travel costs. We will review alternative approaches to forming segments, evaluating the segments formed by each sorting method according to contiguity, completeness, compactness, and between-segment variance, and will present segment formation algorithm that uses all 4 sorting methods.
Wednesday, August 11, 2021: 1:30-3:20 pm (ET): Modeling Survey Nonresponse Under a Cluster Sample Design: Classification and Regression Tree Methodologies Compared: Michael Jones*, William Everett Cecere, Tien-Huan Lin, Jennifer Kali, Ismael Flores Cervantes
When computing survey weights for use in analyzing complex sample survey data, an adjustment for nonresponse is often performed to reduce the bias of the estimates. Many algorithms and methodologies are available to you for modeling survey nonresponse for these adjustments. What’s the best approach? We dig down deep and compare select algorithms when working with a complex cluster sample design. We also evaluate the effect of the classification tree-based methods on the reduction of nonresponse bias in high-response and low-response settings, and investigate the performance of the methods when they are used to adjust survey weights. What are the benefits and limitations of using these methods for estimating response propensities in surveys that use a cluster sample? We discuss them, too.