An Annotated Corpus of Picture Stories Retold by Language Learners

Christine Köhn, Arne Köhn


Abstract
Corpora with language learner writing usually consist of essays, which are difficult to annotate reliably and to process automatically due to the high degree of freedom and the nature of learner language. We develop a task which mildly constrains learner utterances to facilitate consistent annotation and reliable automatic processing but at the same time does not prime learners with textual information. In this task, learners retell a comic strip. We present the resulting task-based corpus of stories written by learners of German. We designed the corpus to be able to serve multiple purposes: The corpus was manually annotated, including target hypotheses and syntactic structures. We achieve a very high inter-annotator agreement: κ = 0.765 for the annotation of minimal target hypotheses and κ = 0.507 for the extended target hypotheses. We attribute this to the design of our task and the annotation guidelines, which are based on those for the Falko corpus (Reznicek et al., 2012).
Anthology ID:
W18-4914
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | LAW | MWE | WS
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–132
Language:
URL:
https://www.aclweb.org/anthology/W18-4914
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-4914.pdf