Masashi Toyoda


2020

pdf bib
uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems
Tsuta Yuma | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Because open-domain dialogues allow diverse responses, basic reference-based metrics such as BLEU do not work well unless we prepare a massive reference set of high-quality responses for input utterances. To reduce this burden, a human-aided, uncertainty-aware metric, ΔBLEU, has been proposed; it embeds human judgment on the quality of reference outputs into the computation of multiple-reference BLEU. In this study, we instead propose a fully automatic, uncertainty-aware evaluation method for open-domain dialogue systems, υBLEU. This method first collects diverse reference responses from massive dialogue data and then annotates their quality judgments by using a neural network trained on automatically collected training data. Experimental results on massive Twitter data confirmed that υBLEU is comparable to ΔBLEU in terms of its correlation with human judgment and that the state of the art automatic evaluation method, RUBER, is improved by integrating υBLEU.

pdf bib
Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation
Shoetsu Sato | Jin Sakuma | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Findings of the Association for Computational Linguistics: EMNLP 2020

Neural network methods exhibit strong performance only in a few resource-rich domains. Practitioners therefore employ domain adaptation from resource-rich domains that are, in most cases, distant from the target domain. Domain adaptation between distant domains (e.g., movie subtitles and research papers), however, cannot be performed effectively due to mismatches in vocabulary; it will encounter many domain-specific words (e.g., “angstrom”) and words whose meanings shift across domains (e.g., “conductor”). In this study, aiming to solve these vocabulary mismatches in domain adaptation for neural machine translation (NMT), we propose vocabulary adaptation, a simple method for effective fine-tuning that adapts embedding layers in a given pretrained NMT model to the target domain. Prior to fine-tuning, our method replaces the embedding layers of the NMT model by projecting general word embeddings induced from monolingual data in a target domain onto a source-domain embedding space. Experimental results indicate that our method improves the performance of conventional fine-tuning by 3.86 and 3.28 BLEU points in En-Ja and De-En translation, respectively.

pdf bib
A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

2019

pdf bib
Modeling Personal Biases in Language Use by Inducing Personalized Word Embeddings
Daisuke Oba | Naoki Yoshinaga | Shoetsu Sato | Satoshi Akasaki | Masashi Toyoda
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

There exist biases in individual’s language use; the same word (e.g., cool) is used for expressing different meanings (e.g., temperature range) or different words (e.g., cloudy, hazy) are used for describing the same meaning. In this study, we propose a method of modeling such personal biases in word meanings (hereafter, semantic variations) with personalized word embeddings obtained by solving a task on subjective text while regarding words used by different individuals as different words. To prevent personalized word embeddings from being contaminated by other irrelevant biases, we solve a task of identifying a review-target (objective output) from a given review. To stabilize the training of this extreme multi-class classification, we perform a multi-task learning with metadata identification. Experimental results with reviews retrieved from RateBeer confirmed that the obtained personalized word embeddings improved the accuracy of sentiment analysis as well as the target task. Analysis of the obtained personalized word embeddings revealed trends in semantic variations related to frequent and adjective words.

pdf bib
Learning to Describe Unknown Phrases with Local and Global Contexts
Shonosuke Ishiwatari | Hiroaki Hayashi | Naoki Yoshinaga | Graham Neubig | Shoetsu Sato | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

When reading a text, it is common to become stuck on unfamiliar words and phrases, such as polysemous words with novel senses, rarely used idioms, internet slang, or emerging entities. If we humans cannot figure out the meaning of those expressions from the immediate local context, we consult dictionaries for definitions or search documents or the web to find other global context to help in interpretation. Can machines help us do this work? Which type of context is more important for machines to solve the problem? To answer these questions, we undertake a task of describing a given phrase in natural language based on its local and global contexts. To solve this task, we propose a neural description model that consists of two context encoders and a description decoder. In contrast to the existing methods for non-standard English explanation [Ni+ 2017] and definition generation [Noraset+ 2017; Gadetsky+ 2018], our model appropriately takes important clues from both local and global contexts. Experimental results on three existing datasets (including WordNet, Oxford and Urban Dictionaries) and a dataset newly created from Wikipedia demonstrate the effectiveness of our method over previous work.

2017

pdf bib
Modeling Situations in Neural Chat Bots
Shoetsu Sato | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of ACL 2017, Student Research Workshop

pdf bib
A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size
Masato Neishi | Jin Sakuma | Satoshi Tohda | Shonosuke Ishiwatari | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

In this paper, we describe the team UT-IIS’s system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17.

2016

pdf bib
Kotonush: Understanding Concepts Based on Values behind Social Media
Tatsuya Iwanari | Kohei Ohara | Naoki Yoshinaga | Nobuhiro Kaji | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Kotonush, a system that clarifies people’s values on various concepts on the basis of what they write about on social media, is presented. The values are represented by ordering sets of concepts (e.g., London, Berlin, and Rome) in accordance with a common attribute intensity expressed by an adjective (e.g., entertaining). We exploit social media text written by different demographics and at different times in order to induce specific orderings for comparison. The system combines a text-to-ordering module with an interactive querying interface enabled by massive hyponymy relations and provides mechanisms to compare the induced orderings from various viewpoints. We empirically evaluate Kotonush and present some case studies, featuring real-world concept orderings with different domains on Twitter, to demonstrate the usefulness of our system.

2015

pdf bib
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs
Shonosuke Ishiwatari | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

2013

pdf bib
Predicting and Eliciting Addressee’s Emotion in Online Dialogue
Takayuki Hasegawa | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Identifying Constant and Unique Relations by using Time-Series Text
Yohei Takaku | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Sentiment Classification in Resource-Scarce Languages by using Label Propagation
Yong Ren | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2009

pdf bib
A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query
Makoto Imamura | Yasuhiro Takayama | Nobuhiro Kaji | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers