Keisuke Sakaguchi


2020

pdf bib
Uncertain Natural Language Inference
Tongfei Chen | Zhengping Jiang | Adam Poliak | Keisuke Sakaguchi | Benjamin Van Durme
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce Uncertain Natural Language Inference (UNLI), a refinement of Natural Language Inference (NLI) that shifts away from categorical labels, targeting instead the direct prediction of subjective probability assessments. We demonstrate the feasibility of collecting annotations for UNLI by relabeling a portion of the SNLI dataset under a probabilistic scale, where items even with the same categorical label differ in how likely people judge them to be true given a premise. We describe a direct scalar regression modeling approach, and find that existing categorically-labeled NLI data can be used in pre-training. Our best models correlate well with humans, demonstrating models are capable of more subtle inferences than the categorical bin assignment employed in current NLI tasks.

pdf bib
The Universal Decompositional Semantics Dataset and Decomp Toolkit
Aaron Steven White | Elias Stengel-Eskin | Siddharth Vashishtha | Venkata Subrahmanyan Govindarajan | Dee Ann Reisinger | Tim Vieira | Keisuke Sakaguchi | Sheng Zhang | Francis Ferraro | Rachel Rudinger | Kyle Rawlins | Benjamin Van Durme
Proceedings of the 12th Language Resources and Evaluation Conference

We present the Universal Decompositional Semantics (UDS) dataset (v1.0), which is bundled with the Decomp toolkit (v0.1). UDS1.0 unifies five high-quality, decompositional semantics-aligned annotation sets within a single semantic graph specification—with graph structures defined by the predicative patterns produced by the PredPatt tool and real-valued node and edge attributes constructed using sophisticated normalization procedures. The Decomp toolkit provides a suite of Python 3 tools for querying UDS graphs using SPARQL. Both UDS1.0 and Decomp0.1 are publicly available at http://decomp.io.

pdf bib
A Dataset for Tracking Entities in Open Domain Procedural Text
Niket Tandon | Keisuke Sakaguchi | Bhavana Dalvi | Dheeraj Rajagopal | Peter Clark | Michal Guerquin | Kyle Richardson | Eduard Hovy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky, opaque, and clear. Previous formulations of this task provide the text and entities involved, and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples (entity, attribute, before-state, after-state) for each step, where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures.

2019

pdf bib
WIQA: A dataset for “What if...” reasoning over procedural text
Niket Tandon | Bhavana Dalvi | Keisuke Sakaguchi | Peter Clark | Antoine Bosselut
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We introduce WIQA, the first large-scale dataset of “What if...” questions over procedural text. WIQA contains a collection of paragraphs, each annotated with multiple influence graphs describing how one change affects another, and a large (40k) collection of “What if...?” multiple-choice questions derived from these. For example, given a paragraph about beach erosion, would stormy weather hasten or decelerate erosion? WIQA contains three kinds of questions: perturbations to steps mentioned in the paragraph; external (out-of-paragraph) perturbations requiring commonsense knowledge; and irrelevant (no effect) perturbations. We find that state-of-the-art models achieve 73.8% accuracy, well below the human performance of 96.3%. We analyze the challenges, in particular tracking chains of influences, and present the dataset as an open challenge to the community.

2018

pdf bib
Efficient Online Scalar Annotation with Bounded Support
Keisuke Sakaguchi | Benjamin Van Durme
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We describe a novel method for efficiently eliciting scalar annotations for dataset construction and system quality estimation by human judgments. We contrast direct assessment (annotators assign scores to items directly), online pairwise ranking aggregation (scores derive from annotator comparison of items), and a hybrid approach (EASL: Efficient Annotation of Scalar Labels) proposed here. Our proposal leads to increased correlation with ground truth, at far greater annotator efficiency, suggesting this strategy as an improved mechanism for dataset creation and manual system evaluation.

2017

pdf bib
Error-repair Dependency Parsing for Ungrammatical Texts
Keisuke Sakaguchi | Matt Post | Benjamin Van Durme
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a new dependency parsing scheme which jointly parses a sentence and repairs grammatical errors by extending the non-directional transition-based formalism of Goldberg and Elhadad (2010) with three additional actions: SUBSTITUTE, DELETE, INSERT. Because these actions may cause an infinite loop in derivation, we also introduce simple constraints that ensure the parser termination. We evaluate our model with respect to dependency accuracy and grammaticality improvements for ungrammatical sentences, demonstrating the robustness and applicability of our scheme.

pdf bib
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction
Courtney Napoles | Keisuke Sakaguchi | Joel Tetreault
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and benchmark four leading GEC systems on this corpus, identifying specific areas in which they do well and how they can improve. JFLEG fulfills the need for a new gold standard to properly assess the current state of GEC.

pdf bib
Grammatical Error Correction with Neural Reinforcement Learning
Keisuke Sakaguchi | Matt Post | Benjamin Van Durme
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a neural encoder-decoder model with reinforcement learning (NRL) for grammatical error correction (GEC). Unlike conventional maximum likelihood estimation (MLE), the model directly optimizes towards an objective that considers a sentence-level, task-specific evaluation metric, avoiding the exposure bias issue in MLE. We demonstrate that NRL outperforms MLE both in human and automated evaluation metrics, achieving the state-of-the-art on a fluency-oriented GEC corpus.

pdf bib
GEC into the future: Where are we going and how do we get there?
Keisuke Sakaguchi | Courtney Napoles | Joel Tetreault
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

The field of grammatical error correction (GEC) has made tremendous bounds in the last ten years, but new questions and obstacles are revealing themselves. In this position paper, we discuss the issues that need to be addressed and provide recommendations for the field to continue to make progress, and propose a new shared task. We invite suggestions and critiques from the audience to make the new shared task a community-driven venture.

2016

pdf bib
Universal Decompositional Semantics on Universal Dependencies
Aaron Steven White | Drew Reisinger | Keisuke Sakaguchi | Tim Vieira | Sheng Zhang | Rachel Rudinger | Kyle Rawlins | Benjamin Van Durme
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction
Courtney Napoles | Keisuke Sakaguchi | Joel Tetreault
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Phrase Structure Annotation and Parsing for Learner English
Ryo Nagata | Keisuke Sakaguchi
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality
Keisuke Sakaguchi | Courtney Napoles | Matt Post | Joel Tetreault
Transactions of the Association for Computational Linguistics, Volume 4

The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics. One unvisited assumption, however, is the reliance of GEC evaluation on error-coded corpora, which contain specific labeled corrections. We examine current practices and show that GEC’s reliance on such corpora unnaturally constrains annotation and automatic evaluation, resulting in (a) sentences that do not sound acceptable to native speakers and (b) system rankings that do not correlate with human judgments. In light of this, we propose an alternate approach that jettisons costly error coding in favor of unannotated, whole-sentence rewrites. We compare the performance of existing metrics over different gold-standard annotations, and show that automatic evaluation with our new annotation scheme has very strong correlation with expert rankings (ρ = 0.82). As a result, we advocate for a fundamental and necessary shift in the goal of GEC, from correcting small, labeled error types, to producing text that has native fluency.

2015

pdf bib
Effective Feature Integration for Automated Short Answer Scoring
Keisuke Sakaguchi | Michael Heilman | Nitin Madnani
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Ground Truth for Grammatical Error Correction Metrics
Courtney Napoles | Keisuke Sakaguchi | Matt Post | Joel Tetreault
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Efficient Elicitation of Annotations for Human Evaluation of Machine Translation
Keisuke Sakaguchi | Matt Post | Benjamin Van Durme
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
Construction of English MWE Dictionary and its Application to POS Tagging
Yutaro Shigeto | Ai Azuma | Sorami Hisamoto | Shuhei Kondo | Tomoya Kose | Keisuke Sakaguchi | Akifumi Yoshimoto | Frances Yung | Yuji Matsumoto
Proceedings of the 9th Workshop on Multiword Expressions

pdf bib
NAIST at the NLI 2013 Shared Task
Tomoya Mizumoto | Yuta Hayashibe | Keisuke Sakaguchi | Mamoru Komachi | Yuji Matsumoto
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
NAIST at 2013 CoNLL Grammatical Error Correction Shared Task
Ippei Yoshimoto | Tomoya Kose | Kensuke Mitsuzawa | Keisuke Sakaguchi | Tomoya Mizumoto | Yuta Hayashibe | Mamoru Komachi | Yuji Matsumoto
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners
Keisuke Sakaguchi | Yuki Arase | Mamoru Komachi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
NAIST at the HOO 2012 Shared Task
Keisuke Sakaguchi | Yuta Hayashibe | Shuhei Kondo | Lis Kanashiro | Tomoya Mizumoto | Mamoru Komachi | Yuji Matsumoto
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Joint English Spelling Error Correction and POS Tagging for Language Learners Writing
Keisuke Sakaguchi | Tomoya Mizumoto | Mamoru Komachi | Yuji Matsumoto
Proceedings of COLING 2012