Michael Lyu


2020

pdf bib
Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading
Yifan Gao | Chien-Sheng Wu | Shafiq Joty | Caiming Xiong | Richard Socher | Irwin King | Michael Lyu | Steven C.H. Hoi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The goal of conversational machine reading is to answer user questions given a knowledge base text which may require asking clarification questions. Existing approaches are limited in their decision making due to struggles in extracting question-related rules and reasoning about them. In this paper, we present a new framework of conversational machine reading that comprises a novel Explicit Memory Tracker (EMT) to track whether conditions listed in the rule text have already been satisfied to make a decision. Moreover, our framework generates clarification questions by adopting a coarse-to-fine reasoning strategy, utilizing sentence-level entailment scores to weight token-level distributions. On the ShARC benchmark (blind, held-out) testset, EMT achieves new state-of-the-art results of 74.6% micro-averaged decision accuracy and 49.5 BLEU4. We also show that EMT is more interpretable by visualizing the entailment-oriented reasoning process as the conversation flows. Code and models are released at https://github.com/Yifan-Gao/explicit_memory_tracker.

pdf bib
Photon: A Robust Cross-Domain Text-to-SQL System
Jichuan Zeng | Xi Victoria Lin | Steven C.H. Hoi | Richard Socher | Caiming Xiong | Michael Lyu | Irwin King
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Natural language interfaces to databases(NLIDB) democratize end user access to relational data. Due to fundamental differences between natural language communication and programming, it is common for end users to issue questions that are ambiguous to the system or fall outside the semantic scope of its underlying query language. We present PHOTON, a robust, modular, cross-domain NLIDB that can flag natural language input to which a SQL mapping cannot be immediately determined. PHOTON consists of a strong neural semantic parser (63.2% structure accuracy on the Spider dev benchmark), a human-in-the-loop question corrector, a SQL executor and a response generator. The question corrector isa discriminative neural sequence editor which detects confusion span(s) in the input question and suggests rephrasing until a translatable input is given by the user or a maximum number of iterations are conducted. Experiments on simulated data show that the proposed method effectively improves the robustness of text-to-SQL system against untranslatable user input.The live demo of our system is available at http://www.naturalsql.com

pdf bib
Dialogue Generation on Infrequent Sentence Functions via Structured Meta-Learning
Yifan Gao | Piji Li | Wei Bi | Xiaojiang Liu | Michael Lyu | Irwin King
Findings of the Association for Computational Linguistics: EMNLP 2020

Sentence function is an important linguistic feature indicating the communicative purpose in uttering a sentence. Incorporating sentence functions into conversations has shown improvements in the quality of generated responses. However, the number of utterances for different types of fine-grained sentence functions is extremely imbalanced. Besides a small number of high-resource sentence functions, a large portion of sentence functions is infrequent. Consequently, dialogue generation conditioned on these infrequent sentence functions suffers from data deficiency. In this paper, we investigate a structured meta-learning (SML) approach for dialogue generation on infrequent sentence functions. We treat dialogue generation conditioned on different sentence functions as separate tasks, and apply model-agnostic meta-learning to high-resource sentence functions data. Furthermore, SML enhances meta-learning effectiveness by promoting knowledge customization among different sentence functions but simultaneously preserving knowledge generalization for similar sentence functions. Experimental results demonstrate that SML not only improves the informativeness and relevance of generated responses, but also can generate responses consistent with the target sentence functions. Code will be public to facilitate the research along this line.

pdf bib
Exploiting Unsupervised Data for Emotion Recognition in Conversations
Wenxiang Jiao | Michael Lyu | Irwin King
Findings of the Association for Computational Linguistics: EMNLP 2020

Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations, which is essentially a text classification task. Unlike the sentence-level text classification problem, the available supervised data for the ERC task is limited, which potentially prevents the models from playing their maximum effect. In this paper, we propose a novel approach to leverage unsupervised conversation data, which is more accessible. Specifically, we propose the Conversation Completion (ConvCom) task, which attempts to select the correct answer from candidate answers to fill a masked utterance in a conversation. Then, we Pre-train a basic COntext-Dependent Encoder (Pre-CODE) on the ConvCom task. Finally, we fine-tune the Pre-CODE on the datasets of ERC. Experimental results demonstrate that pre-training on unsupervised data achieves significant improvement of performance on the ERC datasets, particularly on the minority emotion classes.

pdf bib
Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation
Wenxiang Jiao | Xing Wang | Shilin He | Irwin King | Michael Lyu | Zhaopeng Tu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Large-scale training datasets lie at the core of the recent success of neural machine translation (NMT) models. However, the complex patterns and potential noises in the large-scale data make training NMT models difficult. In this work, we explore to identify the inactive training examples which contribute less to the model performance, and show that the existence of inactive examples depends on the data distribution. We further introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples. The proposed framework consists of three phases. First, we train an identification model on the original training data, and use it to distinguish inactive examples and active examples by their sentence-level output probabilities. Then, we train a rejuvenation model on the active examples, which is used to re-label the inactive examples with forward- translation. Finally, the rejuvenated examples and the active examples are combined to train the final NMT model. Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models. Extensive analyses reveal that our approach stabilizes and accelerates the training process of NMT models, resulting in final models with better generalization capability.

pdf bib
Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading
Yifan Gao | Chien-Sheng Wu | Jingjing Li | Shafiq Joty | Steven C.H. Hoi | Caiming Xiong | Irwin King | Michael Lyu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Document interpretation and dialog understanding are the two major challenges for conversational machine reading. In this work, we propose “Discern”, a discourse-aware entailment reasoning network to strengthen the connection and enhance the understanding of both document and dialog. Specifically, we split the document into clause-like elementary discourse units (EDU) using a pre-trained discourse segmentation model, and we train our model in a weakly-supervised manner to predict whether each EDU is entailed by the user feedback in a conversation. Based on the learned EDU and entailment representations, we either reply to the user our final decision “yes/no/irrelevant” of the initial question, or generate a follow-up question to inquiry more information. Our experiments on the ShARC benchmark (blind, held-out test set) show that Discern achieves state-of-the-art results of 78.3% macro-averaged accuracy on decision making and 64.0 BLEU1 on follow-up question generation. Code and models are released at https://github.com/Yifan-Gao/Discern.

pdf bib
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang | Jing Li | Michael Lyu | Irwin King
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Social media produces large amounts of contents every day. To help users quickly capture what they need, keyphrase prediction is receiving a growing attention. Nevertheless, most prior efforts focus on text modeling, largely ignoring the rich features embedded in the matching images. In this work, we explore the joint effects of texts and images in predicting the keyphrases for a multimedia post. To better align social media style texts and images, we propose: (1) a novel Multi-Modality MultiHead Attention (M3H-Att) to capture the intricate cross-media interactions; (2) image wordings, in forms of optical characters and image attributes, to bridge the two modalities. Moreover, we design a unified framework to leverage the outputs of keyphrase classification and generation and couple their advantages. Extensive experiments on a large-scale dataset newly collected from Twitter show that our model significantly outperforms the previous state of the art based on traditional attention mechanisms. Further analyses show that our multi-head attention is able to attend information from various aspects and boost classification or generation in diverse scenarios.

pdf bib
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang | Shafiq Joty | Michael Lyu | Irwin King | Caiming Xiong | Steven C.H. Hoi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Visual dialog is a challenging vision-language task, where a dialog agent needs to answer a series of questions through reasoning on the image content and dialog history. Prior work has mostly focused on various attention mechanisms to model such intricate interactions. By contrast, in this work, we propose VD-BERT, a simple yet effective framework of unified vision-dialog Transformer that leverages the pretrained BERT language models for Visual Dialog tasks. The model is unified in that (1) it captures all the interactions between the image and the multi-turn dialog using a single-stream Transformer encoder, and (2) it supports both answer ranking and answer generation seamlessly through the same architecture. More crucially, we adapt BERT for the effective fusion of vision and dialog contents via visually grounded training. Without the need of pretraining on external vision-language data, our model yields new state of the art, achieving the top position in both single-model and ensemble settings (74.54 and 75.35 NDCG scores) on the visual dialog leaderboard. Our code and pretrained models are released at https://github.com/salesforce/VD-BERT.

2019

pdf bib
Towards Understanding Neural Machine Translation with Word Importance
Shilin He | Zhaopeng Tu | Xing Wang | Longyue Wang | Michael Lyu | Shuming Shi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Although neural machine translation (NMT) has advanced the state-of-the-art on various language pairs, the interpretability of NMT remains unsatisfactory. In this work, we propose to address this gap by focusing on understanding the input-output behavior of NMT models. Specifically, we measure the word importance by attributing the NMT output to every input word through a gradient-based method. We validate the approach on a couple of perturbation operations, language pairs, and model architectures, demonstrating its superiority on identifying input words with higher influence on translation performance. Encouragingly, the calculated importance can serve as indicators of input words that are under-translated by NMT models. Furthermore, our analysis reveals that words of certain syntactic categories have higher importance while the categories vary across language pairs, which can inspire better design principles of NMT architectures for multi-lingual translation.