A Method for Human-Interpretable Paraphrasticality Prediction

Maria Moritz, Johannes Hellrich, Sven Büchel


Abstract
The detection of reused text is important in a wide range of disciplines. However, even as research in the field of plagiarism detection is constantly improving, heavily modified or paraphrased text is still challenging for current methodologies. For historical texts, these problems are even more severe, since text sources were often subject to stronger and more frequent modifications. Despite the need for tools to automate text criticism, e.g., tracing modifications in historical text, algorithmic support is still limited. While current techniques can tell if and how frequently a text has been modified, very little work has been done on determining the degree and kind of paraphrastic modification—despite such information being of substantial interest to scholars. We present a human-interpretable, feature-based method to measure paraphrastic modification. Evaluating our technique on three data sets, we find that our approach performs competitive to text similarity scores borrowed from machine translation evaluation, being much harder to interpret.
Anthology ID:
W18-4513
Volume:
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico
Venues:
COLING | LaTeCH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–118
Language:
URL:
https://www.aclweb.org/anthology/W18-4513
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-4513.pdf