How can text data be quickly and accurately processed?

Using natural language processing and deep learning to classify comments in MEPS

Client

Agency for Healthcare Research and Quality (AHRQ)

Challenge

The Medical Expenditure Panel Survey (MEPS), funded by the Agency for Healthcare Research and Quality (AHRQ), is a set of large-scale surveys of families and individuals, and their medical providers across the U.S.

More than 20,000 open-ended comments are entered by the interviewers each year into the MEPS computer-assisted personal interviewing (CAPI) system to clarify the respondents’ answers.

Then, a group of human coders reviews each sentence to assign a topic label to the sentences out of 10 predefined classes, and use the associated procedures to further process the data. Also, MEPS is a panel study and there is a short time window, usually a week, to process the comments so that the data can get back to the field staff for use for dependent interviewing in the next wave. Processing this data is labor intensive and time consuming.

Westat harnessed the power of artificial intelligence (AI) capabilities to make the process more timely and efficient.

Solution

Westat uses natural language processing (NLP), machine learning (ML), and deep learning techniques to train a classification model to automatically label the comments into 10 predefined classes.

We then deploy the model as a RESTful API in production so that it can run in the backend of the system used by the human coders. The model suggests the top 3 classes for each sentence ranked by classification probability, which allows human coders to make a selection out of 3 rather than 10 when reviewing the comments.

Results

The data tool has been in production for the past 2 data collection periods in 2020. The tool achieved more than 95% classification accuracy for the top suggestion in processing 10,000+ comments for each round, with an efficiency gain of about 5% and reducing backlog to virtually zero.

Capabilities

Advanced Technologies Data Collection Data Science Machine Learning and Artificial Intelligence Natural Language Processing and Text Analytics Statistical Methods

Senior Expert Contact

Kevin Wilson

Vice President

Insights

Deep Dive with Our Experts

view all insights

Expert Interview
Safeguarding Data Privacy: Innovations at Westat
October 2025

Protecting the privacy of people who respond to surveys and censuses is a critical responsibility in federal data collection. During the 2010 Decennial Census, the…
Expert Interview
Novel Strategies Transform Clinical Trials Into Future Opportunities
October 2025

For the rigorous evaluation of novel medicines, clinical trials are essential to ensure safety and efficacy. However, clinical trials are experiencing turbulent times due to…
Expert Interview
Real-World Insights: Nirsevimab’s Protection Against RSV in Infants
September 2025

Each year in the U.S., respiratory syncytial virus, commonly known as RSV, leads to approximately 58,000 to 80,000 hospitalizations among children under age 5. The…

Projects

Keep Reading

view all projects

Patient-Centered Outcomes Research Institute (PCORI)
How can a learning network help improve maternal health outcomes?
National Aeronautics and Space Administration (NASA)
How annoyed are people by the noise from supersonic planes?
U.S. Bureau of Labor Statistics (BLS)
How can we improve the reliability of employee compensation official statistics?

Clinical Research

Biostatistics and Epidemiology

How can text data be quickly and accurately processed?

Challenge

Solution

Results

How can we help?

Want to work with us?