Ichiro Yamada


2020

pdf bib
Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT
Hideya Mino | Hideki Tanaka | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

pdf bib
Effective Use of Target-side Context for Neural Machine Translation
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

2019

pdf bib
Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Hideki Tanaka | Takenobu Tokunaga
Proceedings of the 6th Workshop on Asian Translation

This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.

2017

pdf bib
Extracting Important Tweets for News Writers using Recurrent Neural Network with Attention Mechanism and Multi-task Learning
Taro Miyazaki | Shin Toriumi | Yuka Takei | Ichiro Yamada | Jun Goto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib
Tweet Extraction for News Production Considering Unreality
Yuka Takei | Taro Miyazaki | Ichiro Yamada | Jun Goto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2012

pdf bib
Measuring the Similarity between TV Programs using Semantic Relations
Ichiro Yamada | Masaru Miyazaki | Hideki Sumiyoshi | Atsushi Matsui | Hironori Furumiya | Hideki Tanaka
Proceedings of COLING 2012

2011

pdf bib
Relation Acquisition using Word Classes and Partial Patterns
Stijn De Saeger | Kentaro Torisawa | Masaaki Tsuchida | Jun’ichi Kazama | Chikara Hashimoto | Ichiro Yamada | Jong Hoon Oh | Istvan Varga | Yulan Yan
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Extending WordNet with Hypernyms and Siblings Acquired from Wikipedia
Ichiro Yamada | Jong-Hoon Oh | Chikara Hashimoto | Kentaro Torisawa | Jun’ichi Kazama | Stijn De Saeger | Takuya Kawada
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Co-STAR: A Co-training Style Algorithm for Hyponymy Relation Acquisition from Structured and Unstructured Text
Jong-Hoon Oh | Ichiro Yamada | Kentaro Torisawa | Stijn De Saeger
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures
Ichiro Yamada | Kentaro Torisawa | Jun’ichi Kazama | Kow Kuroda | Masaki Murata | Stijn De Saeger | Francis Bond | Asuka Sumida
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Automatic Discovery of Telic and Agentive Roles from Corpus Data
Ichiro Yamada | Timothy Baldwin
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation