Abhinav Singh


pdf bib
CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models
Abhinav Singh | Patrick Xia | Guanghui Qin | Mahsa Yarmohammadi | Benjamin Van Durme
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Copy mechanisms are employed in sequence to sequence (seq2seq) models to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.


pdf bib
Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering
J. Edward Hu | Abhinav Singh | Nils Holzenberger | Matt Post | Benjamin Van Durme
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Producing diverse paraphrases of a sentence is a challenging task. Natural paraphrase corpora are scarce and limited, while existing large-scale resources are automatically generated via back-translation and rely on beam search, which tends to lack diversity. We describe ParaBank 2, a new resource that contains multiple diverse sentential paraphrases, produced from a bilingual corpus using negative constraints, inference sampling, and clustering.We show that ParaBank 2 significantly surpasses prior work in both lexical and syntactic diversity while being meaning-preserving, as measured by human judgments and standardized metrics. Further, we illustrate how such paraphrastic resources may be used to refine contextualized encoders, leading to improvements in downstream tasks.