Hideki Tanaka


2020

pdf bib
Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT
Hideya Mino | Hideki Tanaka | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

2019

pdf bib
Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Hideki Tanaka | Takenobu Tokunaga
Proceedings of the 6th Workshop on Asian Translation

This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.

2017

pdf bib
Detecting Untranslated Content for Neural Machine Translation
Isao Goto | Hideki Tanaka
Proceedings of the First Workshop on Neural Machine Translation

Despite its promise, neural machine translation (NMT) has a serious problem in that source content may be mistakenly left untranslated. The ability to detect untranslated content is important for the practical use of NMT. We evaluate two types of probability with which to detect untranslated content: the cumulative attention (ATN) probability and back translation (BT) probability from the target sentence to the source sentence. Experiments on detecting untranslated content in Japanese-English patent translations show that ATN and BT are each more effective than random choice, BT is more effective than ATN, and the combination of the two provides further improvements. We also confirmed the effectiveness of using ATN and BT to rerank the n-best NMT outputs.

2015

pdf bib
The “News Web Easy” news service as a resource for teaching and learning Japanese: An assessment of the comprehension difficulty of Japanese sentence-end expressions
Hideki Tanaka | Tadashi Kumano | Isao Goto
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2012

pdf bib
Measuring the Similarity between TV Programs using Semantic Relations
Ichiro Yamada | Masaru Miyazaki | Hideki Sumiyoshi | Atsushi Matsui | Hironori Furumiya | Hideki Tanaka
Proceedings of COLING 2012

2009

pdf bib
Syntax-Driven Sentence Revision for Broadcast News Summarization
Hideki Tanaka | Akinori Kinoshita | Takeshi Kobayakawa | Tadashi Kumano | Naoto Katoh
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2005

pdf bib
Analysis and Modeling of Manual Summarization of Japanese Broadcast News
Hideki Tanaka | Tadashi Kumano | Masamichi Nishiwaki | Takayuki Itoh
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
Back Transliteration from Japanese to English using Target English Context
Isao Goto | Naoto Kato | Terumasa Ehara | Hideki Tanaka
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Word Selection for EBMT based on Monolingual Similarity and Translation Confidence
Eiji Aramaki | Sadao Kurohashi | Hideki Kashioka | Hideki Tanaka
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Comparing the Sentence Alignment Yield from Two News Corpora Using a Dictionary-Based Alignment System
Stephen Nightingale | Hideki Tanaka
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags
Tadashi Kumano | Hideki Kashioka | Hideki Tanaka | Takahiro Fukusima
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition

2002

pdf bib
Automatic Alignment of Japanese and English Newspaper Articles using an MT System and a Bilingual Company Name Dictionary
Kenji Matsumoto | Hideki Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
ATR-SLT System for SENSEVAL-2 Japanese Translation Task
Tadashi Kumano | Hideki Kashioka | Hideki Tanaka
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

1999

pdf bib
An Efficient Statistical Speech Act Type Tagging System for Speech Translation Systems
Hideki Tanaka | Akio Yokoo
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Context Management with Topics for Spoken Dialogue Systems
Kristiina Jokinen | Hideki Tanaka | Akio Yokoo
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Context Management with Topics for Spoken Dialogue Systems
Kristiina Jokinen | Hideki Tanaka | Akio Yokoo
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Planning Dialogue Contributions With New Information
Kristiina Jokinen | Hideki Tanaka | Akio Yokoo
Natural Language Generation

1996

pdf bib
Decision Tree Learning Algorithm with Structured Attributes: Application to Verbal Case Frame Acquisition
Hideki Tanaka
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib
Verbal Case Frame Acquisition From a Bilingual Corpus: Gradual Knowledge Acquisition
Hideki Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1992

pdf bib
A Method of Translating English Delexical Structures Into Japanese
Hideki Tanaka | Teruaki Aizawa | Yeun-Bae Kim | Nobuko Hatada
COLING 1992 Volume 2: The 15th International Conference on Computational Linguistics

1990

pdf bib
A Machine Translation System for Foreign News in Satellite Broadcasting
Teruaki Aizawa | Terumasa Ehara | Noriyoshi Uratani | Hideki Tanaka | Naoto Kato | Sumio Nakase | Norikazu Aruga | Takeo Matsuda
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics