Why the machine (dis)agrees: understanding uncertainty in natural language processing classifications
Abstract
As interest increases in using natural language processing methods ( machine coding ) to supplant labor-intensive human coding of survey responses, the physics education research community needs methods to determine the accuracy and reliability of machine coding. Existing literature uses measures of agreement between human and machine coding (e.g., Cohen s kappa) to assess machine coding. We need to understand the cause of underlying agreement/disagreement, not simply the level, however, if we are ever to trust a machine learning algorithm s codes without a thorough comparison to human coding. For response datasets to a few different survey questions, we will present data on the uncertainty levels of machine coding as a function of i) training set characteristics and ii) test set characteristics and discuss the underlying causes of these uncertainty levels. We describe the conditions in which we can use these uncertainty measurements to form trustworthy conclusions from machine coded data. This work is supported by NSF Grants \#2000739 and \#1808945.
Date Published
Conference Name
APS April Meeting 2023
URL
https://ui.adsabs.harvard.edu/abs/2023APS..APRB18002F