Ernie Chang


2020

pdf bib
Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence
Xiaoyu Shen | Ernie Chang | Hui Su | Cheng Niu | Dietrich Klakow
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The neural attention model has achieved great success in data-to-text generation tasks. Though usually excelling at producing fluent text, it suffers from the problem of information missing, repetition and “hallucination”. Due to the black-box nature of the neural attention architecture, avoiding these problems in a systematic way is non-trivial. To address this concern, we propose to explicitly segment target text into fragment units and align them with their data correspondences. The segmentation and correspondence are jointly learned as latent variables without any human annotations. We further impose a soft statistical constraint to regularize the segmental granularity. The resulting architecture maintains the same expressive power as neural attention models, while being able to generate fully interpretable outputs with several times less computational cost. On both E2E and WebNLG benchmarks, we show the proposed model consistently outperforms its neural attention counterparts.

pdf bib
DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool
Ernie Chang | Jeriah Caplinger | Alex Marin | Xiaoyu Shen | Vera Demberg
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for the general task of labeling structured data with textual descriptions. The tool is implemented as an interactive application that reduces human efforts in annotating large quantities of structured data, e.g. in the format of a table or tree structure. By using a backend sequence-to-sequence model, our system iteratively analyzes the annotated labels in order to better sample unlabeled data. In a simulation experiment performed on annotating large quantities of structured data, DART has been shown to reduce the total number of annotations needed with active learning and automatically suggesting relevant labels.

pdf bib
MovieChats: Chat like Humans in a Closed Domain
Hui Su | Xiaoyu Shen | Zhou Xiao | Zheng Zhang | Ernie Chang | Cheng Zhang | Cheng Niu | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Being able to perform in-depth chat with humans in a closed domain is a precondition before an open-domain chatbot can be ever claimed. In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots. We propose a unified, readily scalable neural approach which reconciles all subtasks like intent prediction and knowledge retrieval. The model is first pretrained on the huge general-domain data, then finetuned on our corpus. We show this simple neural approach trained on high-quality data is able to outperform commercial systems replying on complex rules. On both the static and interactive tests, we find responses generated by our system exhibits remarkably good engagement and sensibleness close to human-written ones. We further analyze the limits of our work and point out potential directions for future work

2019

pdf bib
Improving Language Generation from Feature-Rich Tree-Structured Data with Relational Graph Convolutional Encoders
Xudong Hong | Ernie Chang | Vera Demberg
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

The Multilingual Surface Realization Shared Task 2019 focuses on generating sentences from lemmatized sets of universal dependency parses with rich features. This paper describes the results of our participation in the deep track. The core innovation in our approach is to use a graph convolutional network to encode the dependency trees given as input. Upon adding morphological features, our system achieves the third rank without using data augmentation techniques or additional components (such as a re-ranker).

2018

pdf bib
Generating E-Commerce Product Titles and Predicting their Quality
José G. Camargo de Souza | Michael Kozielski | Prashant Mathur | Ernie Chang | Marco Guerini | Matteo Negri | Marco Turchi | Evgeny Matusov
Proceedings of the 11th International Conference on Natural Language Generation

E-commerce platforms present products using titles that summarize product information. These titles cannot be created by hand, therefore an algorithmic solution is required. The task of automatically generating these titles given noisy user provided titles is one way to achieve the goal. The setting requires the generation process to be fast and the generated title to be both human-readable and concise. Furthermore, we need to understand if such generated titles are usable. As such, we propose approaches that (i) automatically generate product titles, (ii) predict their quality. Our approach scales to millions of products and both automatic and human evaluations performed on real-world data indicate our approaches are effective and applicable to existing e-commerce scenarios.

2017

pdf bib
Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity Model
WenLi Zhuang | Ernie Chang
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes a neural-network model which performed competitively (top 6) at the SemEval 2017 cross-lingual Semantic Textual Similarity (STS) task. Our system employs an attention-based recurrent neural network model that optimizes the sentence similarity. In this paper, we describe our participation in the multilingual STS task which measures similarity across English, Spanish, and Arabic.