How can coding of RECS ESS data be automated and accelerated?
Using natural language processing to match energy suppliers in the RECS ESS
Challenge
The Energy Information Administration (EIA) administers the Residential Energy Consumption Survey (RECS), a nationwide survey of energy-related characteristics, consumption, and expenditures for U.S. homes. Westat conducts the followup component for EIA, the RECS Energy Supplier Survey (ESS). Data from RECS ESS allow for broader comparisons across sectors, as well as projections of future consumption trends.
In this current RECS ESS cycle, there are 20,000+ open-ended entries about energy suppliers in natural gas, electricity, fuel oil, and propane reported by the households. Traditionally, human coders had to manually clean up typos and verify the existence and accuracy of the respondents’ input by looking up a reference list of all the energy suppliers in the U.S. and relying on internet resources. It was a labor-intensive and time-consuming effort.
To speed the process as well as enhance efficiencies, Westat used the power of artificial intelligence (AI) capabilities.
Solution
To automate this process, Westat used natural language processing (NLP), more specifically, string-matching technique—finding strings of data that closely match a pattern, to calculate the editing distance between the respondents’ inputs and all the lookup texts in the reference list.
We then developed an algorithm to identify matched pairs of 1 input and 1 lookup supplier based on the editing distance and some data patterns we discovered during the proof-of-concept stage.
If a pair cannot be detected, for each input, the algorithm will suggest the top 10 reference suppliers ranked by distance for human review.
Results
The end product is a Python program. Using NLP, data entry was significantly accelerated. The first batch of 2021 respondent inputs of 4,000+ entries was processed in less than 30 minutes. It detected 48% matched pairs with 100% precision, and passed 56% of data to human review.
Capabilities
Data Collection Data Science Natural Language Processing and Text Analytics Statistical MethodsTopics
Complex SurveysSenior Expert Contact
Kevin Wilson
Vice President
-
Expert Interview
Safeguarding Data Privacy: Innovations at WestatOctober 2025
Protecting the privacy of people who respond to surveys and censuses is a critical responsibility in federal data collection. During the 2010 Decennial Census, the…
-
Expert Interview
Novel Strategies Transform Clinical Trials Into Future OpportunitiesOctober 2025
For the rigorous evaluation of novel medicines, clinical trials are essential to ensure safety and efficacy. However, clinical trials are experiencing turbulent times due to…
-
Expert Interview
Real-World Insights: Nirsevimab’s Protection Against RSV in InfantsSeptember 2025
Each year in the U.S., respiratory syncytial virus, commonly known as RSV, leads to approximately 58,000 to 80,000 hospitalizations among children under age 5. The…