Hua He


pdf bib
Improving Long Distance Slot Carryover in Spoken Dialogue Systems
Tongfei Chen | Chetan Naik | Hua He | Pushpendre Rastogi | Lambert Mathias
Proceedings of the First Workshop on NLP for Conversational AI

Tracking the state of the conversation is a central component in task-oriented spoken dialogue systems. One such approach for tracking the dialogue state is slot carryover, where a model makes a binary decision if a slot from the context is relevant to the current turn. Previous work on the slot carryover task used models that made independent decisions for each slot. A close analysis of the results show that this approach results in poor performance over longer context dialogues. In this paper, we propose to jointly model the slots. We propose two neural network architectures, one based on pointer networks that incorporate slot ordering information, and the other based on transformer networks that uses self attention mechanism to model the slot interdependencies. Our experiments on an internal dialogue benchmark dataset and on the public DSTC2 dataset demonstrate that our proposed models are able to resolve longer distance slot references and are able to achieve competitive performance.


pdf bib
A Continuously Growing Dataset of Sentential Paraphrases
Wuwei Lan | Siyu Qiu | Hua He | Wei Xu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

A major challenge in paraphrase research is the lack of parallel corpora. In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs. The main advantage of our method is its simplicity, as it gets rid of the classifier or human in the loop needed to select data before annotation and subsequent application of paraphrase identification algorithms in the previous work. We present the largest human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification. In addition, we show that more than 30,000 new sentential paraphrases can be easily and continuously captured every month at ~70% precision, and demonstrate their utility for downstream NLP tasks through phrasal paraphrase extraction. We make our code and data freely available.

pdf bib
An Insight Extraction System on BioMedical Literature with Deep Neural Networks
Hua He | Kris Ganjam | Navendu Jain | Jessica Lundin | Ryen White | Jimmy Lin
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Mining biomedical text offers an opportunity to automatically discover important facts and infer associations among them. As new scientific findings appear across a large collection of biomedical publications, our aim is to tap into this literature to automate biomedical knowledge extraction and identify important insights from them. Towards that goal, we develop a system with novel deep neural networks to extract insights on biomedical literature. Evaluation shows our system is able to provide insights with competitive accuracy of human acceptance and its relation extraction component outperforms previous work.


pdf bib
Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement
Hua He | Jimmy Lin
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement
Hua He | John Wieting | Kevin Gimpel | Jinfeng Rao | Jimmy Lin
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


pdf bib
Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
Hua He | Kevin Gimpel | Jimmy Lin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars
Hua He | Jimmy Lin | Adam Lopez
Transactions of the Association for Computational Linguistics, Volume 3

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on general purpose graphics processing units (GPUs), but these algorithms do not work for hierarchical models, which require matching patterns that contain gaps. We address this limitation by presenting a novel GPU algorithm for on-demand hierarchical grammar extraction that is at least an order of magnitude faster than a comparable CPU algorithm when processing large batches of sentences. In terms of end-to-end translation, with decoding on the CPU, we increase throughput by roughly two thirds on a standard MT evaluation dataset. The GPU necessary to achieve these improvements increases the cost of a server by about a third. We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.


pdf bib
Identification of Speakers in Novels
Hua He | Denilson Barbosa | Grzegorz Kondrak
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
Hua He | Jimmy Lin | Adam Lopez
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


pdf bib
Predicting the Semantic Compositionality of Prefix Verbs
Shane Bergsma | Aditya Bhargava | Hua He | Grzegorz Kondrak
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing