Generalizing Unmasking for Short Texts

Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast


Abstract
Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75–80%, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.
Anthology ID:
N19-1068
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
654–659
Language:
URL:
https://www.aclweb.org/anthology/N19-1068
DOI:
10.18653/v1/N19-1068
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/N19-1068.pdf
Video:
 https://vimeo.com/355735934