Analysing Data-To-Text Generation Benchmarks
Laura Perez-Beltrachini, Claire Gardent
Abstract
A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.- Anthology ID:
- W17-3537
- Volume:
- Proceedings of the 10th International Conference on Natural Language Generation
- Month:
- September
- Year:
- 2017
- Address:
- Santiago de Compostela, Spain
- Venues:
- INLG | WS
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 238–242
- Language:
- URL:
- https://www.aclweb.org/anthology/W17-3537
- DOI:
- 10.18653/v1/W17-3537
- PDF:
- http://aclanthology.lst.uni-saarland.de/W17-3537.pdf