MACAQ : A Multi Annotated Corpus to Study how we Adapt Answers to Various Questions

Anne Garcia-Fernandez, Sophie Rosset, Anne Vilnat


Abstract
This paper presents a corpus of human answers in natural language collected in order to build a base of examples useful when generating natural language answers. We present the corpus and the way we acquired it. Answers correspond to questions with fixed linguistic form, focus, and topic. Answers to a given question exist for two modalities of interaction: oral and written. The whole corpus of answers was annotated manually and automatically on different levels including words from the questions being reused in the answer, the precise element answering the question (or information-answer), and completions. A detailed description of the annotations is presented. Two examples of corpus analyses are described. The first analysis shows some differences between oral and written modality especially in terms of length of the answers. The second analysis concerns the reuse of the question focus in the answers.
Anthology ID:
L10-1208
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/301_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/301_Paper.pdf