Computational Narratology: Extracting Tense Clusters from Narrative Texts

Thomas Bögel, Jannik Strötgen, Michael Gertz


Abstract
Computational Narratology is an emerging field within the Digital Humanities. In this paper, we tackle the problem of extracting temporal information as a basis for event extraction and ordering, as well as further investigations of complex phenomena in narrative texts. While most existing systems focus on news texts and extract explicit temporal information exclusively, we show that this approach is not feasible for narratives. Based on tense information of verbs, we define temporal clusters as an annotation task and validate the annotation schema by showing that the task can be performed with high inter-annotator agreement. To alleviate and reduce the manual annotation effort, we propose a rule-based approach to robustly extract temporal clusters using a multi-layered and dynamic NLP pipeline that combines off-the-shelf components in a heuristic setting. Comparing our results against human judgements, our system is capable of predicting the tense of verbs and sentences with very high reliability: for the most prevalent tense in our corpus, more than 95% of all verbs are annotated correctly.
Anthology ID:
L14-1256
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
950–955
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/280_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/280_Paper.pdf