How can coding of RECS ESS data be automated and accelerated?

Using natural language processing to match energy suppliers in the RECS ESS

Client

Energy Information Administration

Challenge

The Energy Information Administration (EIA) administers the Residential Energy Consumption Survey (RECS), a nationwide survey of energy-related characteristics, consumption, and expenditures for U.S. homes. Westat conducts the followup component for EIA, the RECS Energy Supplier Survey (ESS). Data from RECS ESS allow for broader comparisons across sectors, as well as projections of future consumption trends.

In this current RECS ESS cycle, there are 20,000+ open-ended entries about energy suppliers in natural gas, electricity, fuel oil, and propane reported by the households. Traditionally, human coders had to manually clean up typos and verify the existence and accuracy of the respondents’ input by looking up a reference list of all the energy suppliers in the U.S. and relying on internet resources. It was a labor-intensive and time-consuming effort.

To speed the process as well as enhance efficiencies, Westat used the power of artificial intelligence (AI) capabilities.

Solution

To automate this process, Westat used natural language processing (NLP), more specifically, string-matching technique—finding strings of data that closely match a pattern, to calculate the editing distance between the respondents’ inputs and all the lookup texts in the reference list.

We then developed an algorithm to identify matched pairs of 1 input and 1 lookup supplier based on the editing distance and some data patterns we discovered during the proof-of-concept stage.

If a pair cannot be detected, for each input, the algorithm will suggest the top 10 reference suppliers ranked by distance for human review.

Results

The end product is a Python program. Using NLP, data entry was significantly accelerated. The first batch of 2021 respondent inputs of 4,000+ entries was processed in less than 30 minutes. It detected 48% matched pairs with 100% precision, and passed 56% of data to human review.

Capabilities

Data Collection Data Science Natural Language Processing and Text Analytics Statistical Methods

Topics

Complex Surveys

Senior Expert Contact

Kevin Wilson

Vice President

Insights

Deep Dive with Our Experts

view all insights

Perspective
Leveraging Quantum Computing to Accelerate Biomedical Innovations
March 2025

Quantum computing is poised to revolutionize health care and biomedical research, making a tangible impact on Americans’ everyday lives. By rapidly analyzing vast genetic and…
Perspective
Collaborating to Enhance Student Success Nationwide
January 2025

Sharing best practices, creating connections, and collaboratively tackling challenges to improve student success was the purpose of the recent Promise Neighborhoods and Full-Service Community Schools…
Expert Interview
Leveraging Paradata and AI to Improve Survey Participation Rates
January 2025

The steady decline in survey response rates has been a major concern for many researchers for some time. Low response rates not only erode the…

Projects

Keep Reading

view all projects

U.S. Bureau of Labor Statistics (BLS)
How can we improve the reliability of employee compensation official statistics?
Health Resources and Services Administration (HRSA)
How can informatics improve health center operations?
Centers for Disease Control and Prevention (CDC)
Can natural language processing improve the completeness of immunization data?

Clinical Research

Biostatistics and Epidemiology

How can coding of RECS ESS data be automated and accelerated?

Challenge

Solution

Results

How can we help?

Want to work with us?