J. Edward Hu


2019

pdf bib
Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering
J. Edward Hu | Abhinav Singh | Nils Holzenberger | Matt Post | Benjamin Van Durme
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Producing diverse paraphrases of a sentence is a challenging task. Natural paraphrase corpora are scarce and limited, while existing large-scale resources are automatically generated via back-translation and rely on beam search, which tends to lack diversity. We describe ParaBank 2, a new resource that contains multiple diverse sentential paraphrases, produced from a bilingual corpus using negative constraints, inference sampling, and clustering.We show that ParaBank 2 significantly surpasses prior work in both lexical and syntactic diversity while being meaning-preserving, as measured by human judgments and standardized metrics. Further, we illustrate how such paraphrastic resources may be used to refine contextualized encoders, leading to improvements in downstream tasks.

pdf bib
Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting
J. Edward Hu | Huda Khayrallah | Ryan Culkin | Patrick Xia | Tongfei Chen | Matt Post | Benjamin Van Durme
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Lexically-constrained sequence decoding allows for explicit positive or negative phrase-based constraints to be placed on target output strings in generation tasks such as machine translation or monolingual text rewriting. We describe vectorized dynamic beam allocation, which extends work in lexically-constrained decoding to work with batching, leading to a five-fold improvement in throughput when working with positive constraints. Faster decoding enables faster exploration of constraint strategies: we illustrate this via data augmentation experiments with a monolingual rewriter applied to the tasks of natural language inference, question answering and machine translation, showing improvements in all three.

2018

pdf bib
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
Adam Poliak | Aparajita Haldar | Rachel Rudinger | J. Edward Hu | Ellie Pavlick | Aaron Steven White | Benjamin Van Durme
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.

pdf bib
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
Adam Poliak | Aparajita Haldar | Rachel Rudinger | J. Edward Hu | Ellie Pavlick | Aaron Steven White | Benjamin Van Durme
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

We present a large scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation encoded by a neural network captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. Our collection of diverse datasets is available at http://www.decomp.net/, and will grow over time as additional resources are recast and added from novel sources.