Introducing the LCC Metaphor Datasets

Michael Mohler, Mary Brunson, Bryan Rink, Marc Tomlinson


Abstract
In this work, we present the Language Computer Corporation (LCC) annotated metaphor datasets, which represent the largest and most comprehensive resource for metaphor research to date. These datasets were produced over the course of three years by a staff of nine annotators working in four languages (English, Spanish, Russian, and Farsi). As part of these datasets, we provide (1) metaphoricity ratings for within-sentence word pairs on a four-point scale, (2) scored links to our repository of 114 source concept domains and 32 target concept domains, and (3) ratings for the affective polarity and intensity of each pair. Altogether, we provide 188,741 annotations in English (for 80,100 pairs), 159,915 annotations in Spanish (for 63,188 pairs), 99,740 annotations in Russian (for 44,632 pairs), and 137,186 annotations in Farsi (for 57,239 pairs). In addition, we are providing a large set of likely metaphors which have been independently extracted by our two state-of-the-art metaphor detection systems but which have not been analyzed by our team of annotators.
Anthology ID:
L16-1668
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4221–4227
Language:
URL:
https://www.aclweb.org/anthology/L16-1668
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/L16-1668.pdf