Modelling Uncertainty in Collaborative Document Quality Assessment

Aili Shen, Daniel Beck, Bahar Salehi, Jianzhong Qi, Timothy Baldwin


Abstract
In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task. To imitate people’s disagreement over inherently subjective tasks such as rating the quality of a Wikipedia article, a document quality assessment system should provide not only a prediction of the article quality but also the uncertainty over its predictions. This motivates us to measure the uncertainty in document quality predictions, in addition to making the label prediction. Experimental results show that both Gaussian processes (GPs) and random forests (RFs) can yield competitive results in predicting the quality of Wikipedia articles, while providing an estimate of uncertainty when there is inconsistency in the quality labels from the Wikipedia contributors. We additionally evaluate our methods in the context of a semi-automated document quality class assignment decision-making process, where there is asymmetric risk associated with overestimates and underestimates of document quality. Our experiments suggest that GPs provide more reliable estimates in this context.
Anthology ID:
D19-5525
Volume:
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | WNUT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–201
Language:
URL:
https://www.aclweb.org/anthology/D19-5525
DOI:
10.18653/v1/D19-5525
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/D19-5525.pdf