Automated Essay Scoring in the Presence of Biased Ratings
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Studies in Social Sciences have revealed that when people evaluate someone else, their evaluations often reflect their biases. As a result, rater bias may introduce highly subjective factors that make their evaluations inaccurate. This may affect automated essay scoring models in many ways, as these models are typically designed to model (potentially biased) essay raters. While there is sizeable literature on rater effects in general settings, it remains unknown how rater bias affects automated essay scoring. To this end, we present a new annotated corpus containing essays and their respective scores. Different from existing corpora, our corpus also contains comments provided by the raters in order to ground their scores. We present features to quantify rater bias based on their comments, and we found that rater bias plays an important role in automated essay scoring. We investigated the extent to which rater bias affects models based on hand-crafted features. Finally, we propose to rectify the training set by removing essays associated with potentially biased scores while learning the scoring model.
A Multi-aspect Analysis of Automatic Essay Scoring for Brazilian Portuguese
Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics
Several methods for automatic essay scoring (AES) for English language have been proposed. However, multi-aspect AES systems for other languages are unusual. Therefore, we propose a multi-aspect AES system to apply on a dataset of Brazilian Portuguese essays, which human experts evaluated according to five aspects defined by Brazilian Government to the National Exam to High School Student (ENEM). These aspects are skills that student must master and every skill is assessed apart from each other. Besides the prediction of each aspect, the feature analysis also was performed for each aspect. The AES system proposed employs several features already employed by AES systems for English language. Our results show that predictions for some aspects performed well with the features we employed, while predictions for other aspects performed poorly. Also, it is possible to note the difference between the five aspects in the detailed feature analysis we performed. Besides these contributions, the eight millions of enrollments every year for ENEM raise some challenge issues for future directions in our research.