Akari Asai


2020

pdf bib
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering
Akari Asai | Hannaneh Hajishirzi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many natural language questions require qualitative, quantitative or logical comparisons between two entities or events. This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by integrating logic rules and neural models. Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model. Improving the global consistency of predictions, our approach achieves large improvements over previous methods in a variety of question answering (QA) tasks, including multiple-choice qualitative reasoning, cause-effect reasoning, and extractive machine reading comprehension. In particular, our method significantly improves the performance of RoBERTa-based models by 1-5% across datasets. We advance state of the art by around 5-8% on WIQA and QuaRel and reduce consistency violations by 58% on HotpotQA. We further demonstrate that our approach can learn effectively from limited data.

pdf bib
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
Ikuya Yamada | Akari Asai | Hiroyuki Shindo | Hideaki Takeda | Yuji Matsumoto
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at https://github.com/studio-ousia/luke.

pdf bib
Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia
Ikuya Yamada | Akari Asai | Jin Sakuma | Hiroyuki Shindo | Hideaki Takeda | Yoshiyasu Takefuji | Yuji Matsumoto
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at https://wikipedia2vec.github.io/.

2018

pdf bib
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Akari Asai | Sara Evensen | Behzad Golshan | Alon Halevy | Vivian Li | Andrei Lopatenko | Daniela Stepanov | Yoshihiko Suhara | Wang-Chiew Tan | Yinzhan Xu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)