This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognizing you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
How can coding of RECS ESS data be automated and accelerated?
Using natural language processing to match energy suppliers in the RECS ESS
Challenge
The Energy Information Administration (EIA) administers the Residential Energy Consumption Survey (RECS), a nationwide survey of energy-related characteristics, consumption, and expenditures for U.S. homes. Westat conducts the followup component for EIA, the RECS Energy Supplier Survey (ESS). Data from RECS ESS allow for broader comparisons across sectors, as well as projections of future consumption trends.
In this current RECS ESS cycle, there are 20,000+ open-ended entries about energy suppliers in natural gas, electricity, fuel oil, and propane reported by the households. Traditionally, human coders had to manually clean up typos and verify the existence and accuracy of the respondents’ input by looking up a reference list of all the energy suppliers in the U.S. and relying on internet resources. It was a labor-intensive and time-consuming effort.
To speed the process as well as enhance efficiencies, Westat used the power of artificial intelligence (AI) capabilities.
Solution
To automate this process, Westat used natural language processing (NLP), more specifically, string-matching technique—finding strings of data that closely match a pattern, to calculate the editing distance between the respondents’ inputs and all the lookup texts in the reference list.
We then developed an algorithm to identify matched pairs of 1 input and 1 lookup supplier based on the editing distance and some data patterns we discovered during the proof-of-concept stage.
If a pair cannot be detected, for each input, the algorithm will suggest the top 10 reference suppliers ranked by distance for human review.
Results
The end product is a Python program. Using NLP, data entry was significantly accelerated. The first batch of 2021 respondent inputs of 4,000+ entries was processed in less than 30 minutes. It detected 48% matched pairs with 100% precision, and passed 56% of data to human review.
Capabilities
Data Collection Data Science Natural Language Processing and Text Analytics Statistical MethodsTopics
Complex SurveysSenior Expert Contact
Kevin Wilson
Vice President
-
Perspective
Teacher Apprenticeships Strengthen the WorkforceJuly 2024
Many state education agencies (SEAs) are addressing teacher shortages by creating and expanding alternative paths to the teaching profession. One fast-growing option is teacher apprenticeships,…
-
Expert Interview
Passport to Careers: Aiding Foster and Homeless Young AdultsJuly 2024
The Passport to Careers program in Washington State supports former foster youth and homeless youth unaccompanied by a parent or guardian in achieving their college…
-
Perspective
Highlights of Westat at AAPOR 2024May 2024
We’ve returned from the 79th Annual American Association for Public Opinion Research (AAPOR) Conference, held May 15-17 in Atlanta, where we caught up with colleagues…