Can machine learning techniques analyze qualitative focus group data effectively?

Using data science techniques to enhance perceptual learning


More than ever before, researchers have access to a variety of broad, rich, educationally relevant text data from several sources, such as literature databases (e.g., ERIC [Education Resources Information Center]), open-ended responses from online courses/surveys, online discussion forums, transcribed audio of face-to-face classes or focus groups, digital essays, and social media.

Such advances in data availability (coupled with emerging analytic techniques) can dramatically increase the possibilities for discovering new patterns, efficiencies in research, and testing of new theories in educational contexts. For example, can emerging analytic techniques based in machine learning, such as topic modeling, be used to analyze qualitative focus group data effectively, and what limitations and or recommendations can be identified?


We provided support to Worcester Polytechnic Institute by extending the use of topic modeling on data collected from focus groups with teachers who have implemented a number of math technology interventions with their students: From Here to There (FH2T), a game-based perceptual learning intervention; DragonBox 12+ (DragonBox), a widely used game-based technology application; and 2 versions of ASSISTments (Immediate Feedback and Active Control).

We compared the results of qualitative coding and topic modeling. Unlike the textual data typically used in text mining, and as described above, the focus groups in this study involved dynamic human communications—that is, teachers sharing thoughts and experiences in a meaningful way while taking in, processing, and responding to both the facilitator and other participants. In this process, actors (in this case, teachers) constantly take turns (and listen simultaneously) in roles as speaker and listener (Watzlawick et al., 1967*; DeVito, 2016*). As such, the patterns of communication constantly evolve, and the directions and depth of information exchanged during the focus group can drastically differ depending on the group of teachers and the skills of the facilitator.

This investigation used information collected as part of a larger randomized controlled trial (RCT) conducted in partnership with Worcester Polytechnic Institute, the University of Maine, and Indiana University during the COVID-19 pandemic.

The larger study was an RCT across 10 middle schools (9 in-person and 1 virtual), including 3,600+ 7th-grade students. It tested the impacts of 3 educational technology interventions on algebraic understanding among 7th-graders, across 4 conditions: (a) FH2T, (b) DragonBox, (c) Immediate Feedback, and (d) Active Control. The FH2T and DragonBox conditions represent use of game-based applications. Immediate Feedback entails problem sets using an online homework system, ASSISTments. For purposes of this study, the Active Control condition mimics traditional homework assignments while still using technology.

As part of the larger study, teachers participated in focus groups to discuss (a) teachers’ perspectives of students’ reactions to the mathematics technologies, (b) challenges teachers and students encountered while using the mathematics technologies, (c) impacts of the various applications on student learning, (d) the unique impact of the pandemic on instructing students, and (e) suggestions for changes or improvements to the mathematics technologies. Out of the 34 teachers who implemented the study, 16 (47%) participated in 1 of the 4 focus group sessions.

In this investigation, we examined if topic modeling could extract patterns that were consistent with more qualitative analysis approaches from teacher focus group data. Specific questions we explored in this study were:

  • What themes emerged from the qualitative coding approach and the topic modeling approach?
  • What limitations does topic modeling have regarding the analysis of the focus group data?
  • What recommendations do we have for other researchers who may attempt to use the topic modeling on data collected from focus groups?

Unlike other forms of text data such as literature, books, or essays where a type of information (e.g., information about actors, information about data source, information about findings) is organized within a section, data generated from focus groups reflected dynamic human communications (i.e., participants share thoughts and experiences in a meaningful way while taking in, processing, and responding to the moderator and other participants). The facilitator can also steer the participants back to the focus group questions or go along with the direction of the focus group discussions, depending on the research questions posed.


The topic modeling results showed a high degree of agreement with the qualitative coding results. The biggest difference between the topic modeling and qualitative coding was the organization/classification of themes at the higher level (e.g., 5 themes vs. 3 themes). At the lower level, both approaches identified many of the same sub themes and, therefore, the results tell similar stories about the students’ reactions to working with the different mathematics technologies. In examining the coding and narrative findings from the 2 different methods (topic modeling and qualitative coding), the 2 researchers found that the topic modeling method was less effective in capturing the nuanced information that the qualitative coding was able to identify. For example, the qualitative coding was better able to provide details about students’ reactions specific to each of the learning tools and, thus, was able to provide more nuanced findings than topic modeling. Similarly, the qualitative coding was better able to identify differences and information specific to student populations (i.e., students in special education and accelerated students) than the topic modeling methodology.

This study successfully extended the application of topic modeling on data collected from focus groups. Used together, study results demonstrated that topic modeling is a viable method for coding focus group data in a variety of ways. It can easily be used prior to qualitative analysis to identify nodes (i.e., themes) or in parallel or after qualitative analysis to identify strengths and weaknesses that might be inherent in the qualitative analysis. Of benefit to the qualitative coder is the rapid nature of technology that allows for faster coding using topic modeling.


DeVito, J.A. (2016). The interpersonal communication book (14th ed.). London, England: Pearson Education Limited.

Watzlawick, P., Bavelas, J.B., and Jackson, D.D. (1967). Pragmatics of human communication: A study of interactional patterns, pathologies, and paradoxes. New York, NY: W.W. Norton & Company.


Deep Dive with Our Experts

view all insights
Back to Top

How can we help?

We welcome messages from job seekers, collaborators, and potential clients and partners.

Get in Contact

Want to work with us?

You’ll be in great company.

Explore Careers