Improving Sentence Classification by Multilingual Data Augmentation and Consensus Learning

Yanfei Wang, Yangdong Chen, Yuejie Zhang


Abstract
Neural network based models have achieved impressive results on the sentence classification task. However, most of previous work focuses on designing more sophisticated network or effective learning paradigms on monolingual data, which often suffers from insufficient discriminative knowledge for classification. In this paper, we investigate to improve sentence classification by multilingual data augmentation and consensus learning. Comparing to previous methods, our model can make use of multilingual data generated by machine translation and mine their language-share and language-specific knowledge for better representation and classification. We evaluate our model using English (i.e., source language) and Chinese (i.e., target language) data on several sentence classification tasks. Very positive classification performance can be achieved by our proposed model.
Anthology ID:
2020.ccl-1.78
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
842–852
Language:
English
URL:
https://www.aclweb.org/anthology/2020.ccl-1.78
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ccl-1.78.pdf