Guanghui Qin


2020

pdf bib
CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models
Abhinav Singh | Patrick Xia | Guanghui Qin | Mahsa Yarmohammadi | Benjamin Van Durme
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Copy mechanisms are employed in sequence to sequence (seq2seq) models to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.

2018

pdf bib
Learning Latent Semantic Annotations for Grounding Natural Language to Structured Data
Guanghui Qin | Jin-Ge Yao | Xuening Wang | Jinpeng Wang | Chin-Yew Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Previous work on grounded language learning did not fully capture the semantics underlying the correspondences between structured world state representations and texts, especially those between numerical values and lexical terms. In this paper, we attempt at learning explicit latent semantic annotations from paired structured tables and texts, establishing correspondences between various types of values and texts. We model the joint probability of data fields, texts, phrasal spans, and latent annotations with an adapted semi-hidden Markov model, and impose a soft statistical constraint to further improve the performance. As a by-product, we leverage the induced annotations to extract templates for language generation. Experimental results suggest the feasibility of the setting in this study, as well as the effectiveness of our proposed framework.

pdf bib
Data2Text Studio: Automated Text Generation from Structured Data
Longxu Dou | Guanghui Qin | Jinpeng Wang | Jin-Ge Yao | Chin-Yew Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Data2Text Studio is a platform for automated text generation from structured data. It is equipped with a Semi-HMMs model to extract high-quality templates and corresponding trigger conditions from parallel data automatically, which improves the interactivity and interpretability of the generated text. In addition, several easy-to-use tools are provided for developers to edit templates of pre-trained models, and APIs are released for developers to call the pre-trained model to generate texts in third-party applications. We conduct experiments on RotoWire datasets for template extraction and text generation. The results show that our model achieves improvements on both tasks.