Naoya Inoue


2020

pdf bib
R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason
Naoya Inoue | Pontus Stenetorp | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent studies have revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This prevents the community from reliably measuring the progress of RC systems. To address this issue, we introduce R4C, a new task for evaluating RC systems’ internal reasoning. R4C requires giving not only answers but also derivations: explanations that justify predicted answers. We present a reliable, crowdsourced framework for scalably annotating RC datasets with derivations. We create and publicly release the R4C dataset, the first, quality-assured dataset consisting of 4.6k questions, each of which is annotated with 3 reference derivations (i.e. 13.8k derivations). Experiments show that our automatic evaluation metrics using multiple reference derivations are reliable, and that R4C assesses different skills from an existing benchmark.

pdf bib
Modeling Event Salience in Narratives via Barthes’ Cardinal Functions
Takaki Otake | Sho Yokoi | Naoya Inoue | Ryo Takahashi | Tatsuki Kuribayashi | Kentaro Inui
Proceedings of the 28th International Conference on Computational Linguistics

Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes’ definition of event salience and propose several unsupervised methods that require only a pre-trained language model. Evaluating the proposed methods on folktales with event salience annotation, we show that the proposed methods outperform baseline methods and find fine-tuning a language model on narrative texts is a key factor in improving the proposed methods.

2019

pdf bib
When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride Kavumba | Naoya Inoue | Benjamin Heinzerling | Keshav Singh | Paul Reisert | Kentaro Inui
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the wrong one. Are BERT’s and RoBERTa’s good performance on COPA also caused by this? We find superficial cues in COPA, as well as evidence that BERT exploits these cues.To remedy this problem, we introduce Balanced COPA, an extension of COPA that does not suffer from easy-to-exploit single token cues. We analyze BERT’s and RoBERTa’s performance on original and Balanced COPA, finding that BERT relies on superficial cues when they are present, but still achieves comparable performance once they are made ineffective, suggesting that BERT learns the task to a certain degree when forced to. In contrast, RoBERTa does not appear to rely on superficial cues.

pdf bib
Inject Rubrics into Short Answer Grading System
Tianqi Wang | Naoya Inoue | Hiroki Ouchi | Tomoya Mizumoto | Kentaro Inui
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Short Answer Grading (SAG) is a task of scoring students’ answers in examinations. Most existing SAG systems predict scores based only on the answers, including the model used as base line in this paper, which gives the-state-of-the-art performance. But they ignore important evaluation criteria such as rubrics, which play a crucial role for evaluating answers in real-world situations. In this paper, we present a method to inject information from rubrics into SAG systems. We implement our approach on top of word-level attention mechanism to introduce the rubric information, in order to locate information in each answer that are highly related to the score. Our experimental results demonstrate that injecting rubric information effectively contributes to the performance improvement and that our proposed model outperforms the state-of-the-art SAG model on the widely used ASAP-SAS dataset under low-resource settings.

pdf bib
Improving Evidence Detection by Leveraging Warrants
Keshav Singh | Paul Reisert | Naoya Inoue | Pride Kavumba | Kentaro Inui
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Recognizing the implicit link between a claim and a piece of evidence (i.e. warrant) is the key to improving the performance of evidence detection. In this work, we explore the effectiveness of automatically extracted warrants for evidence detection. Given a claim and candidate evidence, our proposed method extracts multiple warrants via similarity search from an existing, structured corpus of arguments. We then attentively aggregate the extracted warrants, considering the consistency between the given argument and the acquired warrants. Although a qualitative analysis on the warrants shows that the extraction method needs to be improved, our results indicate that our method can still improve the performance of evidence detection.

pdf bib
An Empirical Study of Span Representations in Argumentation Structure Parsing
Tatsuki Kuribayashi | Hiroki Ouchi | Naoya Inoue | Paul Reisert | Toshinori Miyoshi | Jun Suzuki | Kentaro Inui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

For several natural language processing (NLP) tasks, span representation design is attracting considerable attention as a promising new technique; a common basis for an effective design has been established. With such basis, exploring task-dependent extensions for argumentation structure parsing (ASP) becomes an interesting research direction. This study investigates (i) span representation originally developed for other NLP tasks and (ii) a simple task-dependent extension for ASP. Our extensive experiments and analysis show that these representations yield high performance for ASP and provide some challenging types of instances to be parsed.

pdf bib
Unsupervised Learning of Discourse-Aware Text Representation for Essay Scoring
Farjana Sultana Mim | Naoya Inoue | Paul Reisert | Hiroki Ouchi | Kentaro Inui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Existing document embedding approaches mainly focus on capturing sequences of words in documents. However, some document classification and regression tasks such as essay scoring need to consider discourse structure of documents. Although some prior approaches consider this issue and utilize discourse structure of text for document classification, these approaches are dependent on computationally expensive parsers. In this paper, we propose an unsupervised approach to capture discourse structure in terms of coherence and cohesion for document embedding that does not require any expensive parser or annotation. Extrinsic evaluation results show that the document representation obtained from our approach improves the performance of essay Organization scoring and Argument Strength scoring.

pdf bib
Distantly Supervised Biomedical Knowledge Acquisition via Knowledge Graph Based Attention
Qin Dai | Naoya Inoue | Paul Reisert | Ryo Takahashi | Kentaro Inui
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

The increased demand for structured scientific knowledge has attracted considerable attention in extracting scientific relation from the ever growing scientific publications. Distant supervision is widely applied approach to automatically generate large amounts of labelled data with low manual annotation cost. However, distant supervision inevitably accompanies the wrong labelling problem, which will negatively affect the performance of Relation Extraction (RE). To address this issue, (Han et al., 2018) proposes a novel framework for jointly training both RE model and Knowledge Graph Completion (KGC) model to extract structured knowledge from non-scientific dataset. In this work, we firstly investigate the feasibility of this framework on scientific dataset, specifically on biomedical dataset. Secondly, to achieve better performance on the biomedical dataset, we extend the framework with other competitive KGC models. Moreover, we proposed a new end-to-end KGC model to extend the framework. Experimental results not only show the feasibility of the framework on the biomedical dataset, but also indicate the effectiveness of our extensions, because our extended model achieves significant and consistent improvements on distant supervised RE as compared with baselines.

pdf bib
Annotating with Pros and Cons of Technologies in Computer Science Papers
Hono Shirai | Naoya Inoue | Jun Suzuki | Kentaro Inui
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

This paper explores a task for extracting a technological expression and its pros/cons from computer science papers. We report ongoing efforts on an annotated corpus of pros/cons and an analysis of the nature of the automatic extraction task. Specifically, we show how to adapt the targeted sentiment analysis task for pros/cons extraction in computer science papers and conduct an annotation study. In order to identify the challenges of the automatic extraction task, we construct a strong baseline model and conduct an error analysis. The experiments show that pros/cons can be consistently annotated by several annotators, and that the task is challenging due to domain-specific knowledge. The annotated dataset is made publicly available for research purposes.

2018

pdf bib
Feasible Annotation Scheme for Capturing Policy Argument Reasoning using Argument Templates
Paul Reisert | Naoya Inoue | Tatsuki Kuribayashi | Kentaro Inui
Proceedings of the 5th Workshop on Argument Mining

Most of the existing works on argument mining cast the problem of argumentative structure identification as classification tasks (e.g. attack-support relations, stance, explicit premise/claim). This paper goes a step further by addressing the task of automatically identifying reasoning patterns of arguments using predefined templates, which is called argument template (AT) instantiation. The contributions of this work are three-fold. First, we develop a simple, yet expressive set of easily annotatable ATs that can represent a majority of writer’s reasoning for texts with diverse policy topics while maintaining the computational feasibility of the task. Second, we create a small, but highly reliable annotated corpus of instantiated ATs on top of reliably annotated support and attack relations and conduct an annotation study. Third, we formulate the task of AT instantiation as structured prediction constrained by a feasible set of templates. Our evaluation demonstrates that we can annotate ATs with a reasonably high inter-annotator agreement, and the use of template-constrained inference is useful for instantiating ATs with only partial reasoning comprehension clues.

pdf bib
Improving Scientific Relation Classification with Task Specific Supersense
Qin Dai | Naoya Inoue | Paul Reisert | Kentaro Inui
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2017

pdf bib
Generating Stylistically Consistent Dialog Responses with Transfer Learning
Reina Akama | Kazuaki Inada | Naoya Inoue | Sosuke Kobayashi | Kentaro Inui
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a novel, data-driven, and stylistically consistent dialog response generation system. To create a user-friendly system, it is crucial to make generated responses not only appropriate but also stylistically consistent. For leaning both the properties effectively, our proposed framework has two training stages inspired by transfer learning. First, we train the model to generate appropriate responses, and then we ensure that the responses have a specific style. Experimental results demonstrate that the proposed method produces stylistically consistent responses while maintaining the appropriateness of the responses learned in a general domain.

pdf bib
An RNN-based Binary Classifier for the Story Cloze Test
Melissa Roemmele | Sosuke Kobayashi | Naoya Inoue | Andrew Gordon
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

The Story Cloze Test consists of choosing a sentence that best completes a story given two choices. In this paper we present a system that performs this task using a supervised binary classifier on top of a recurrent neural network to predict the probability that a given story ending is correct. The classifier is trained to distinguish correct story endings given in the training data from incorrect ones that we artificially generate. Our experiments evaluate different methods for generating these negative examples, as well as different embedding-based representations of the stories. Our best result obtains 67.2% accuracy on the test set, outperforming the existing top baseline of 58.5%.

pdf bib
Handling Multiword Expressions in Causality Estimation
Shota Sasaki | Sho Takase | Naoya Inoue | Naoaki Okazaki | Kentaro Inui
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

2016

pdf bib
Modeling Context-sensitive Selectional Preference with Distributed Representations
Naoya Inoue | Yuichiroh Matsubayashi | Masayuki Ono | Naoaki Okazaki | Kentaro Inui
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper proposes a novel problem setting of selectional preference (SP) between a predicate and its arguments, called as context-sensitive SP (CSP). CSP models the narrative consistency between the predicate and preceding contexts of its arguments, in addition to the conventional SP based on semantic types. Furthermore, we present a novel CSP model that extends the neural SP model (Van de Cruys, 2014) to incorporate contextual information into the distributed representations of arguments. Experimental results demonstrate that the proposed CSP model successfully learns CSP and outperforms the conventional SP model in coreference cluster ranking.

2015

pdf bib
A Computational Approach for Generating Toulmin Model Argumentation
Paul Reisert | Naoya Inoue | Naoaki Okazaki | Kentaro Inui
Proceedings of the 2nd Workshop on Argumentation Mining

2014

pdf bib
An Example-Based Approach to Difficult Pronoun Resolution
Canasai Kruengkrai | Naoya Inoue | Jun Sugiura | Kentaro Inui
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2012

pdf bib
Coreference Resolution with ILP-based Weighted Abduction
Naoya Inoue | Ekaterina Ovchinnikova | Kentaro Inui | Jerry Hobbs
Proceedings of COLING 2012