Kazuaki Hanawa


2020

pdf bib
Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation
Takumi Gotou | Ryo Nagata | Masato Mita | Kazuaki Hanawa
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents performance measures for grammatical error correction which take into account the difficulty of error correction. To the best of our knowledge, no conventional measure has such functionality despite the fact that some errors are easy to correct and others are not. The main purpose of this work is to provide a way of determining the difficulty of error correction and to motivate researchers in the domain to attack such difficult errors. The performance measures are based on the simple idea that the more systems successfully correct an error, the easier it is considered to be. This paper presents a set of algorithms to implement this idea. It evaluates the performance measures quantitatively and qualitatively on a wide variety of corpora and systems, revealing that they agree with our intuition of correction difficulty. A scorer and difficulty weight data based on the algorithms have been made available on the web.

pdf bib
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents
Ryo Fujii | Masato Mita | Kaori Abe | Kazuaki Hanawa | Makoto Morishita | Jun Suzuki | Kentaro Inui
Proceedings of the 28th International Conference on Computational Linguistics

Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input, such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with considerable noise, such as User-Generated Contents (UGC) on the Internet. To make better use of NMT for cross-cultural communication, one of the most promising directions is to develop a model that correctly handles these expressions. Though its importance has been recognized, it is still not clear as to what creates the great gap in performance between the translation of clean input and that of UGC. To answer the question, we present a new dataset, PheMT, for evaluating the robustness of MT systems against specific linguistic phenomena in Japanese-English translation. Our experiments with the created dataset revealed that not only our in-house models but even widely used off-the-shelf systems are greatly disturbed by the presence of certain phenomena.

2019

pdf bib
The Sally Smedley Hyperpartisan News Detector at SemEval-2019 Task 4
Kazuaki Hanawa | Shota Sasaki | Hiroki Ouchi | Jun Suzuki | Kentaro Inui
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system submitted to the formal run of SemEval-2019 Task 4: Hyperpartisan news detection. Our system is based on a linear classifier using several features, i.e., 1) embedding features based on the pre-trained BERT embeddings, 2) article length features, and 3) embedding features of informative phrases extracted from by-publisher dataset. Our system achieved 80.9% accuracy on the test set for the formal run and got the 3rd place out of 42 teams.

2018

pdf bib
Predicting Stances from Social Media Posts using Factorization Machines
Akira Sasaki | Kazuaki Hanawa | Naoaki Okazaki | Kentaro Inui
Proceedings of the 27th International Conference on Computational Linguistics

Social media provide platforms to express, discuss, and shape opinions about events and issues in the real world. An important step to analyze the discussions on social media and to assist in healthy decision-making is stance detection. This paper presents an approach to detect the stance of a user toward a topic based on their stances toward other topics and the social media posts of the user. We apply factorization machines, a widely used method in item recommendation, to model user preferences toward topics from the social media data. The experimental results demonstrate that users’ posts are useful to model topic preferences and therefore predict stances of silent users.

pdf bib
Suspicious News Detection Using Micro Blog Text
Tsubasa Tagami | Hiroki Ouchi | Hiroki Asano | Kazuaki Hanawa | Kaori Uchiyama | Kaito Suzuki | Kentaro Inui | Atsushi Komiya | Atsuo Fujimura | Ryo Yamashita | Hitofumi Yanai | Akinori Machino
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2017

pdf bib
Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization
Akira Sasaki | Kazuaki Hanawa | Naoaki Okazaki | Kentaro Inui
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We presents in this paper our approach for modeling inter-topic preferences of Twitter users: for example, “those who agree with the Trans-Pacific Partnership (TPP) also agree with free trade”. This kind of knowledge is useful not only for stance detection across multiple topics but also for various real-world applications including public opinion survey, electoral prediction, electoral campaigns, and online debates. In order to extract users’ preferences on Twitter, we design linguistic patterns in which people agree and disagree about specific topics (e.g., “A is completely wrong”). By applying these linguistic patterns to a collection of tweets, we extract statements agreeing and disagreeing with various topics. Inspired by previous work on item recommendation, we formalize the task of modeling inter-topic preferences as matrix factorization: representing users’ preference as a user-topic matrix and mapping both users and topics onto a latent feature space that abstracts the preferences. Our experimental results demonstrate both that our presented approach is useful in predicting missing preferences of users and that the latent vector representations of topics successfully encode inter-topic preferences.

pdf bib
A Crowdsourcing Approach for Annotating Causal Relation Instances in Wikipedia
Kazuaki Hanawa | Akira Sasaki | Naoaki Okazaki | Kentaro Inui
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation