How Far Can We Go with Data Selection? A Case Study on Semantic Sequence Tagging Tasks

Samuel Louvan, Bernardo Magnini


Abstract
Although several works have addressed the role of data selection to improve transfer learning for various NLP tasks, there is no consensus about its real benefits and, more generally, there is a lack of shared practices on how it can be best applied. We propose a systematic approach aimed at evaluating data selection in scenarios of increasing complexity. Specifically, we compare the case in which source and target tasks are the same while source and target domains are different, against the more challenging scenario where both tasks and domains are different. We run a number of experiments on semantic sequence tagging tasks, which are relatively less investigated in data selection, and conclude that data selection has more benefit on the scenario when the tasks are the same, while in case of different (although related) tasks from distant domains, a combination of data selection and multi-task learning is ineffective for most cases.
Anthology ID:
2020.insights-1.3
Volume:
Proceedings of the First Workshop on Insights from Negative Results in NLP
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–21
Language:
URL:
https://www.aclweb.org/anthology/2020.insights-1.3
DOI:
10.18653/v1/2020.insights-1.3
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.insights-1.3.pdf
Optional supplementary material:
 2020.insights-1.3.OptionalSupplementaryMaterial.zip