Masahiro Kaneko


pdf bib
Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction
Masahiro Kaneko | Masato Mita | Shun Kiyono | Jun Suzuki | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at:

pdf bib
English-to-Japanese Diverse Translation by Combining Forward and Backward Outputs
Masahiro Kaneko | Aizhan Imankulova | Tosho Hirasawa | Mamoru Komachi
Proceedings of the Fourth Workshop on Neural Generation and Translation

We introduce our TMU system that is submitted to The 4th Workshop on Neural Generation and Translation (WNGT2020) to English-to-Japanese (En→Ja) track on Simultaneous Translation And Paraphrase for Language Education (STAPLE) shared task. In most cases machine translation systems generate a single output from the input sentence, however, in order to assist language learners in their journey with better and more diverse feedback, it is helpful to create a machine translation system that is able to produce diverse translations of each input sentence. However, creating such systems would require complex modifications in a model to ensure the diversity of outputs. In this paper, we investigated if it is possible to create such systems in a simple way and whether it can produce desired diverse outputs. In particular, we combined the outputs from forward and backward neural translation models (NMT). Our system achieved third place in En→Ja track, despite adopting only a simple approach.

pdf bib
A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction
Masato Mita | Shun Kiyono | Masahiro Kaneko | Jun Suzuki | Kentaro Inui
Findings of the Association for Computational Linguistics: EMNLP 2020

Existing approaches for grammatical error correction (GEC) largely rely on supervised learning with manually created GEC datasets. However, there has been little focus on verifying and ensuring the quality of the datasets, and on how lower-quality data might affect GEC performance. We indeed found that there is a non-negligible amount of “noise” where errors were inappropriately edited or left uncorrected. To address this, we designed a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models, and outperformed strong denoising baseline methods. We further applied task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks. We then analyzed the effect of the proposed denoising method, and found that our approach leads to improved coverage of corrections and facilitated fluency edits which are reflected in higher recall and overall performance.

pdf bib
Autoencoding Improves Pre-trained Word Embeddings
Masahiro Kaneko | Danushka Bollegala
Proceedings of the 28th International Conference on Computational Linguistics

Prior works investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to applying a linear autoencoder to minimize the squared L2 reconstruction error. This result contradicts prior work (Mu and Viswanath, 2018) that proposed to remove the top principal components from pre-trained embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings, without requiring access to additional linguistic resources or labeled data.

pdf bib
Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction
Kengo Hotate | Masahiro Kaneko | Mamoru Komachi
Proceedings of the 28th International Conference on Computational Linguistics

In this study, we propose a beam search method to obtain diverse outputs in a local sequence transduction task where most of the tokens in the source and target sentences overlap, such as in grammatical error correction (GEC). In GEC, it is advisable to rewrite only the local sequences that must be rewritten while leaving the correct sequences unchanged. However, existing methods of acquiring various outputs focus on revising all tokens of a sentence. Therefore, existing methods may either generate ungrammatical sentences because they force the entire sentence to be changed or produce non-diversified sentences by weakening the constraints to avoid generating ungrammatical sentences. Considering these issues, we propose a method that does not rewrite all the tokens in a text, but only rewrites those parts that need to be diversely corrected. Our beam search method adjusts the search token in the beam according to the probability that the prediction is copied from the source sentence. The experimental results show that our proposed method generates more diverse corrections than existing methods without losing accuracy in the GEC task.

pdf bib
Cross-lingual Transfer Learning for Grammatical Error Correction
Ikumi Yamashita | Satoru Katsumata | Masahiro Kaneko | Aizhan Imankulova | Mamoru Komachi
Proceedings of the 28th International Conference on Computational Linguistics

In this study, we explore cross-lingual transfer learning in grammatical error correction (GEC) tasks. Many languages lack the resources required to train GEC models. Cross-lingual transfer learning from high-resource languages (the source models) is effective for training models of low-resource languages (the target models) for various tasks. However, in GEC tasks, the possibility of transferring grammatical knowledge (e.g., grammatical functions) across languages is not evident. Therefore, we investigate cross-lingual transfer learning methods for GEC. Our results demonstrate that transfer learning from other languages can improve the accuracy of GEC. We also demonstrate that proximity to source languages has a significant impact on the accuracy of correcting certain types of errors.

pdf bib
SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction
Ryoma Yoshimura | Masahiro Kaneko | Tomoyuki Kajiwara | Mamoru Komachi
Proceedings of the 28th International Conference on Computational Linguistics

We propose a reference-less metric trained on manual evaluations of system outputs for grammatical error correction (GEC). Previous studies have shown that reference-less metrics are promising; however, existing metrics are not optimized for manual evaluations of the system outputs because no dataset of the system output exists with manual evaluation. This study manually evaluates outputs of GEC systems to optimize the metrics. Experimental results show that the proposed metric improves correlation with the manual evaluation in both system- and sentence-level meta-evaluation. Our dataset and metric will be made publicly available.


pdf bib
Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019
Aizhan Imankulova | Masahiro Kaneko | Mamoru Komachi
Proceedings of the 6th Workshop on Asian Translation

We introduce our system that is submitted to the News Commentary task (Japanese<->Russian) of the 6th Workshop on Asian Translation. The goal of this shared task is to study extremely low resource situations for distant language pairs. It is known that using parallel corpora of different language pair as training data is effective for multilingual neural machine translation model in extremely low resource scenarios. Therefore, to improve the translation quality of Japanese<->Russian language pair, our method leverages other in-domain Japanese-English and English-Russian parallel corpora as additional training data for our multilingual NMT model.

pdf bib
Gender-preserving Debiasing for Pre-trained Word Embeddings
Masahiro Kaneko | Danushka Bollegala
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gender-related information, while removing stereotypical discriminative gender biases from pre-trained word embeddings. Specifically, we consider four types of information: feminine, masculine, gender-neutral and stereotypical, which represent the relationship between gender vs. bias, and propose a debiasing method that (a) preserves the gender-related information in feminine and masculine words, (b) preserves the neutrality in gender-neutral words, and (c) removes the biases from stereotypical words. Experimental results on several previously proposed benchmark datasets show that our proposed method can debias pre-trained word embeddings better than existing SoTA methods proposed for debiasing word embeddings while preserving gender-related but non-discriminative information.

pdf bib
Controlling Grammatical Error Correction Using Word Edit Rate
Kengo Hotate | Masahiro Kaneko | Satoru Katsumata | Mamoru Komachi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

When professional English teachers correct grammatically erroneous sentences written by English learners, they use various methods. The correction method depends on how much corrections a learner requires. In this paper, we propose a method for neural grammar error correction (GEC) that can control the degree of correction. We show that it is possible to actually control the degree of GEC by using new training data annotated with word edit rate. Thereby, diverse corrected sentences is obtained from a single erroneous sentence. Moreover, compared to a GEC model that does not use information on the degree of correction, the proposed method improves correction accuracy.

pdf bib
TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track
Masahiro Kaneko | Kengo Hotate | Satoru Katsumata | Mamoru Komachi
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

We introduce our system that is submitted to the restricted track of the BEA 2019 shared task on grammatical error correction1 (GEC). It is essential to select an appropriate hypothesis sentence from the candidates list generated by the GEC model. A re-ranker can evaluate the naturalness of a corrected sentence using language models trained on large corpora. On the other hand, these language models and language representations do not explicitly take into account the grammatical errors written by learners. Thus, it is not straightforward to utilize language representations trained from a large corpus, such as Bidirectional Encoder Representations from Transformers (BERT), in a form suitable for the learner’s grammatical errors. Therefore, we propose to fine-tune BERT on learner corpora with grammatical errors for re-ranking. The experimental results of the W&I+LOCNESS development dataset demonstrate that re-ranking using BERT can effectively improve the correction performance.

pdf bib
Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language
Mio Arai | Masahiro Kaneko | Mamoru Komachi
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Existing example retrieval systems do not include grammatically incorrect examples or present only a few examples, if any. Even if a retrieval system has a wide coverage of incorrect examples along with the correct counterpart, learners need to know whether their query includes errors or not. Considering the usability of retrieving incorrect examples, our proposed method uses a large-scale corpus and presents correct expressions along with incorrect expressions using a grammatical error detection system so that the learner do not need to be aware of how to search for the examples. Intrinsic and extrinsic evaluations indicate that our method improves accuracy of example sentence retrieval and quality of learner’s writing.

pdf bib
Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?
Masato Mita | Tomoya Mizumoto | Masahiro Kaneko | Ryo Nagata | Kentaro Inui
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

This study explores the necessity of performing cross-corpora evaluation for grammatical error correction (GEC) models. GEC models have been previously evaluated based on a single commonly applied corpus: the CoNLL-2014 benchmark. However, the evaluation remains incomplete because the task difficulty varies depending on the test corpus and conditions such as the proficiency levels of the writers and essay topics. To overcome this limitation, we evaluate the performance of several GEC models, including NMT-based (LSTM, CNN, and transformer) and an SMT-based model, against various learner corpora (CoNLL-2013, CoNLL-2014, FCE, JFLEG, ICNALE, and KJ). Evaluation results reveal that the models’ rankings considerably vary depending on the corpus, indicating that single-corpus evaluation is insufficient for GEC models.


pdf bib
TMU System for SLAM-2018
Masahiro Kaneko | Tomoyuki Kajiwara | Mamoru Komachi
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We introduce the TMU systems for the second language acquisition modeling shared task 2018 (Settles et al., 2018). To model learner error patterns, it is necessary to maintain a considerable amount of information regarding the type of exercises learners have been learning in the past and the manner in which they answered them. Tracking an enormous learner’s learning history and their correct and mistaken answers is essential to predict the learner’s future mistakes. Therefore, we propose a model which tracks the learner’s learning history efficiently. Our systems ranked fourth in the English and Spanish subtasks, and fifth in the French subtask.


pdf bib
Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings
Masahiro Kaneko | Yuya Sakaizawa | Mamoru Komachi
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this study, we improve grammatical error detection by learning word embeddings that consider grammaticality and error patterns. Most existing algorithms for learning word embeddings usually model only the syntactic context of words so that classifiers treat erroneous and correct words as similar inputs. We address the problem of contextual information by considering learner errors. Specifically, we propose two models: one model that employs grammatical error patterns and another model that considers grammaticality of the target word. We determine grammaticality of n-gram sequence from the annotated error tags and extract grammatical error patterns for word embeddings from large-scale learner corpora. Experimental results show that a bidirectional long-short term memory model initialized by our word embeddings achieved the state-of-the-art accuracy by a large margin in an English grammatical error detection task on the First Certificate in English dataset.