Kyungjae Lee


2019

pdf bib
Categorical Metadata Representation for Customized Text Classification
Jihyeok Kim | Reinald Kim Amplayo | Kyungjae Lee | Sua Sung | Minji Seo | Seung-won Hwang
Transactions of the Association for Computational Linguistics, Volume 7

The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. This information has been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose using basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various data sets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly.

pdf bib
Learning with Limited Data for Multilingual Reading Comprehension
Kyungjae Lee | Sunghyun Park | Hojae Han | Jinyoung Yeo | Seung-won Hwang | Juho Lee
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper studies the problem of supporting question answering in a new language with limited training resources. As an extreme scenario, when no such resource exists, one can (1) transfer labels from another language, and (2) generate labels from unlabeled data, using translator and automatic labeling function respectively. However, these approaches inevitably introduce noises to the training data, due to translation or generation errors, which require a judicious use of data with varying confidence. To address this challenge, we propose a weakly-supervised framework that quantifies such noises from automatically generated labels, to deemphasize or fix noisy data in training. On reading comprehension task, we demonstrate the effectiveness of our model on low-resource languages with varying similarity to English, namely, Korean and French.

2018

pdf bib
Semi-supervised Training Data Generation for Multilingual Question Answering
Kyungjae Lee | Kyoungho Yoon | Sunghyun Park | Seung-won Hwang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)