Applying machine learning models in multi-institutional studies can generate bias

Cornell Affiliated Author(s)

Author

Rebeckah Fussell

Meagan Sundstrom

Sabrina McDowell

N. Holmes

Abstract

There is increasing interest in deploying machine learning models at scale for multi-institutional studies in physics education research. Here we investigate the efficacy of applying machine learning models to institutions outside of their training set, using natural language processing to code open-ended survey responses. We find that, in general, changing institutional contexts can affect machine learning estimates of code frequencies: either previously documented sources of uncertainty increase in magnitude, new unknown sources of uncertainty emerge, or both. We also find an example where uncertainties do not change between the institution used in the training data and an institution not in the training data. Results suggest that attention to uncertainty is critical, especially when making measurements of student writing across multi-institutional data sets.

Date Published

September 2024

Conference Name

PERC

ISBN Number

978-1-931024-40-2

URL

https://www.per-central.org/items/detail.cfm?ID=16886

Laboratory of Atomic and Solid State Physics

Applying machine learning models in multi-institutional studies can generate bias

Cornell Affiliated Author(s)

Author

Abstract

Date Published

Conference Name

ISBN Number

URL

Research Area

Group (Lab)

Download citation

Follow us on: