A Dataset for Anaphora Analysis in French Emails

Hani Guenoune, Kevin Cousot, Mathieu Lafourcade, Melissa Mekaoui, Cédric Lopez


Abstract
In 2019, about 293 billion emails were sent worldwide every day. They are a valuable source of information and knowledge for professionals. Since the 90’s, many studies have been done on emails and have highlighted the need for resources regarding numerous NLP tasks. Due to the lack of available resources for French, very few studies on emails have been conducted. Anaphora resolution in emails is an unexplored area, annotated resources are needed, at least to answer a first question: Does email communication have specifics that must be addressed to tackle the anaphora resolution task? In order to answer this question 1) we build a French emails corpus composed of 100 anonymized professional threads and make it available freely for scientific exploitation. 2) we provide annotations of anaphoric links in the email collection.
Anthology ID:
2020.crac-1.17
Volume:
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
December
Year:
2020
Address:
Barcelona, Spain (online)
Venues:
COLING | CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–175
Language:
URL:
https://www.aclweb.org/anthology/2020.crac-1.17
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.crac-1.17.pdf