Study finds anti-Asian racial bias in AI grading


A new study by the Educational Testing Service (ETS) has raised concerns about potential racial bias in AI essay grading systems. In the study, shared with the nonprofit The Hechinger Report, researchers compared human grader scores with those of a ChatGPT-based AI model, GPT-4o, on over 13,000 anonymized essays written between 2015 and 2019.

  • Discrepancy found: GPT-4o consistently scored essays lower than human graders, averaging 2.8 compared to 3.7. This discrepancy was most pronounced for Asian American students, who received an average of 3.2 from GPT-4o versus 4.3 from human graders, a difference of 1.1 points. The gap for white, Black and Hispanic students was smaller, averaging 0.9 points.

  • Cause of bias unknown: Researchers are unsure why the AI model exhibited this racial bias, attributing the discrepancy to the model's complex algorithms. ETS researcher Mo Zhang called for caution in the use of AI grading systems in classrooms, noting that “there are methods for doing this and you don’t want to take people who specialize in educational measurement out of the equation.”

Download the NextShark App:

Want to keep up to date on Asian American News? Download the NextShark App today!