A Dataset for Anaphora Analysis in French Emails
Hani Guenoune | Kevin Cousot | Mathieu Lafourcade | Melissa Mekaoui | Cédric Lopez
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
In 2019, about 293 billion emails were sent worldwide every day. They are a valuable source of information and knowledge for professionals. Since the 90’s, many studies have been done on emails and have highlighted the need for resources regarding numerous NLP tasks. Due to the lack of available resources for French, very few studies on emails have been conducted. Anaphora resolution in emails is an unexplored area, annotated resources are needed, at least to answer a first question: Does email communication have specifics that must be addressed to tackle the anaphora resolution task? In order to answer this question 1) we build a French emails corpus composed of 100 anonymized professional threads and make it available freely for scientific exploitation. 2) we provide annotations of anaphoric links in the email collection.