Fei Huang


2020

pdf bib
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Xinyu Wang | Yong Jiang | Nguyen Bach | Tao Wang | Fei Huang | Kewei Tu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student’s and the teachers’ structure-level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.

pdf bib
A Joint Neural Model for Information Extraction with Global Features
Ying Lin | Heng Ji | Fei Huang | Lingfei Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most existing joint neural models for Information Extraction (IE) use local task-specific classifiers to predict labels for individual instances (e.g., trigger, relation) regardless of their interactions. For example, a victim of a die event is likely to be a victim of an attack event in the same sentence. In order to capture such cross-subtask and cross-instance inter-dependencies, we propose a joint neural framework, OneIE, that aims to extract the globally optimal IE result as a graph from an input sentence. OneIE performs end-to-end IE in four stages: (1) Encoding a given sentence as contextualized word representations; (2) Identifying entity mentions and event triggers as nodes; (3) Computing label scores for all nodes and their pairwise links using local classifiers; (4) Searching for the globally optimal graph with a beam decoder. At the decoding stage, we incorporate global features to capture the cross-subtask and cross-instance interactions. Experiments show that adding global features improves the performance of our model and achieves new state of-the-art on all subtasks. In addition, as OneIE does not use any language-specific feature, we prove it can be easily applied to new languages or trained in a multilingual manner.

pdf bib
An Investigation of Potential Function Designs for Neural CRF
Zechuan Hu | Yong Jiang | Nguyen Bach | Tao Wang | Zhongqiang Huang | Fei Huang | Kewei Tu
Findings of the Association for Computational Linguistics: EMNLP 2020

The neural linear-chain CRF model is one of the most widely-used approach to sequence labeling. In this paper, we investigate a series of increasingly expressive potential functions for neural CRF models, which not only integrate the emission and transition functions, but also explicitly take the representations of the contextual words as input. Our extensive experiments show that the decomposed quadrilinear potential function based on the vector representations of two neighboring labels and two neighboring words consistently achieves the best performance.

pdf bib
More Embeddings, Better Sequence Labelers?
Xinyu Wang | Yong Jiang | Nguyen Bach | Tao Wang | Zhongqiang Huang | Fei Huang | Kewei Tu
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent work proposes a family of contextual embeddings that significantly improves the accuracy of sequence labelers over non-contextual embeddings. However, there is no definite conclusion on whether we can build better sequence labelers by combining different kinds of embeddings in various settings. In this paper, we conduct extensive experiments on 3 tasks over 18 datasets and 8 languages to study the accuracy of sequence labeling with various embedding concatenations and make three observations: (1) concatenating more embedding variants leads to better accuracy in rich-resource and cross-domain settings and some conditions of low-resource settings; (2) concatenating contextual sub-word embeddings with contextual character embeddings hurts the accuracy in extremely low-resource settings; (3) based on the conclusion of (1), concatenating additional similar contextual embeddings cannot lead to further improvements. We hope these conclusions can help people build stronger sequence labelers in various settings.

pdf bib
FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN
Ebrahim Ansari | Amittai Axelrod | Nguyen Bach | Ondřej Bojar | Roldano Cattoni | Fahim Dalvi | Nadir Durrani | Marcello Federico | Christian Federmann | Jiatao Gu | Fei Huang | Kevin Knight | Xutai Ma | Ajay Nagesh | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Xing Shi | Sebastian Stüker | Marco Turchi | Alexander Waibel | Changhan Wang
Proceedings of the 17th International Conference on Spoken Language Translation

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation. A total of teams participated in at least one of the tracks. This paper introduces each track’s goal, data and evaluation metrics, and reports the results of the received submissions.

pdf bib
Aspect Sentiment Classification with Aspect-Specific Opinion Spans
Lu Xu | Lidong Bing | Wei Lu | Fei Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aspect based sentiment analysis, predicting sentiment polarity of given aspects, has drawn extensive attention. Previous attention-based models emphasize using aspect semantics to help extract opinion features for classification. However, these works are either not able to capture opinion spans as a whole, or not able to capture variable-length opinion spans. In this paper, we present a neat and effective structured attention model by aggregating multiple linear-chain CRFs. Such a design allows the model to extract aspect-specific opinion spans and then evaluate sentiment polarity by exploiting the extracted opinion features. The experimental results on four datasets demonstrate the effectiveness of the proposed model, and our analysis demonstrates that our model can capture aspect-specific opinion spans.

pdf bib
AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network
Xinyu Wang | Yong Jiang | Nguyen Bach | Tao Wang | Zhongqiang Huang | Fei Huang | Kewei Tu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms such as the forward-backward and Viterbi algorithms are typically applied in training and prediction stages of the CRF model. However, these algorithms require sequential computation that makes parallelization impossible. In this paper, we propose to employ a parallelizable approximate variational inference algorithm for the CRF model. Based on this algorithm, we design an approximate inference network that can be connected with the encoder of the neural CRF model to form an end-to-end network, which is amenable to parallelization for faster training and prediction. The empirical results show that our proposed approaches achieve a 12.7-fold improvement in decoding speed with long sentences and a competitive accuracy compared with the traditional CRF approach.

pdf bib
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation
Bin Bi | Chenliang Li | Chen Wu | Ming Yan | Wei Wang | Songfang Huang | Fei Huang | Luo Si
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self-supervised pre-training, such as BERT, MASS and BART, has emerged as a powerful technique for natural language understanding and generation. Existing pre-training techniques employ autoencoding and/or autoregressive objectives to train Transformer-based models by recovering original word tokens from corrupted text with some masked tokens. The training goals of existing techniques are often inconsistent with the goals of many language generation tasks, such as generative question answering and conversational response generation, for producing new text given context. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. The new scheme alleviates the mismatch introduced by the existing denoising scheme between pre-training and fine-tuning where generation is more than reconstructing original text. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks covering generative question answering (Rank 1 on the official MARCO leaderboard), abstractive summarization on CNN/DailyMail as well as Gigaword, question generation on SQuAD, and conversational response generation on Cornell Movie Dialogues.

pdf bib
OpenUE: An Open Toolkit of Universal Extraction from Text
Ningyu Zhang | Shumin Deng | Zhen Bi | Haiyang Yu | Jiacheng Yang | Mosha Chen | Fei Huang | Wei Zhang | Huajun Chen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Natural language processing covers a wide variety of tasks with token-level or sentence-level understandings. In this paper, we provide a simple insight that most tasks can be represented in a single universal extraction format. We introduce a prototype model and provide an open-source and extensible toolkit called OpenUE for various extraction tasks. OpenUE allows developers to train custom models to extract information from the text and supports quick model validation for researchers. Besides, OpenUE provides various functional modules to maintain sufficient modularity and extensibility. Except for the toolkit, we also deploy an online demo with restful APIs to support real-time extraction without training and deploying. Additionally, the online system can extract information in various tasks, including relational triple extraction, slot & intent detection, event extraction, and so on. We release the source code, datasets, and pre-trained models to promote future researches in http://github.com/zjunlp/openue.

pdf bib
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation
Jian Guan | Fei Huang | Zhihao Zhao | Xiaoyan Zhu | Minlie Huang
Transactions of the Association for Computational Linguistics, Volume 8

Story generation, namely, generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we use multi-task learning, which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

2019

pdf bib
ARAML: A Stable Adversarial Training Framework for Text Generation
Pei Ke | Fei Huang | Minlie Huang | Xiaoyan Zhu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Most of the existing generative adversarial networks (GAN) for text generation suffer from the instability of reinforcement learning training algorithms such as policy gradient, leading to unstable performance. To tackle this problem, we propose a novel framework called Adversarial Reward Augmented Maximum Likelihood (ARAML). During adversarial training, the discriminator assigns rewards to samples which are acquired from a stationary distribution near the data rather than the generator’s distribution. The generator is optimized with maximum likelihood estimation augmented by the discriminator’s rewards instead of policy gradient. Experiments show that our model can outperform state-of-the-art text GANs with a more stable training process.

2016

pdf bib
Semi-supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data
Boxing Chen | Fei Huang
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
Using Relevant Public Posts to Enhance News Article Summarization
Chen Li | Zhongyu Wei | Yang Liu | Yang Jin | Fei Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

A news article summary usually consists of 2-3 key sentences that reflect the gist of that news article. In this paper we explore using public posts following a new article to improve automatic summary generation for the news article. We propose different approaches to incorporate information from public posts, including using frequency information from the posts to re-estimate bigram weights in the ILP-based summarization model and to re-weight a dependency tree edge’s importance for sentence compression, directly selecting sentences from posts as the final summary, and finally a strategy to combine the summarization results generated from news articles and posts. Our experiments on data collected from Facebook show that relevant public posts provide useful information and can be effectively leveraged to improve news article summarization results.

2015

pdf bib
Improved Arabic Dialect Classification with Social Media Data
Fei Huang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Learning Representations for Weakly Supervised Natural Language Processing Tasks
Fei Huang | Arun Ahuja | Doug Downey | Yi Yang | Yuhong Guo | Alexander Yates
Computational Linguistics, Volume 40, Issue 1 - March 2014

pdf bib
Adaptive HTER Estimation for Document-Specific MT Post-Editing
Fei Huang | Jian-Ming Xu | Abraham Ittycheriah | Salim Roukos
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improving Word Alignment Using Linguistic Code Switching Data
Fei Huang | Alexander Yates
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Generalized Reordering Rules for Improved SMT
Fei Huang | Cezar Pendus
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Scoring Spoken Responses Based on Content Accuracy
Fei Huang | Lei Chen | Jana Sukkarieh
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Biased Representation Learning for Domain Adaptation
Fei Huang | Alexander Yates
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Language Models as Representations for Weakly Supervised NLP Tasks
Fei Huang | Alexander Yates | Arun Ahuja | Doug Downey
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
Goodness: A Method for Measuring Machine Translation Confidence
Nguyen Bach | Fei Huang | Yaser Al-Onaizan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Exploring Representation-Learning Approaches to Domain Adaptation
Fei Huang | Alexander Yates
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

pdf bib
Feature-Rich Discriminative Phrase Rescoring for SMT
Fei Huang | Bing Xiang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Open-Domain Semantic Role Labeling by Modeling Word Spans
Fei Huang | Alexander Yates
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling
Fei Huang | Alexander Yates
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Confidence Measure for Word Alignment
Fei Huang
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
When Harry Met Harri: Cross-lingual Name Spelling Normalization
Fei Huang | Ahmad Emami | Imed Zitouni
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Hierarchical System Combination for Machine Translation
Fei Huang | Kishore Papineni
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2005

pdf bib
Cluster-specific Named Entity Transliteration
Fei Huang
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Mining Key Phrase Translations from Web Corpora
Fei Huang | Ying Zhang | Stephan Vogel
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Improving Named Entity Translation Combining Phonetic and Semantic Similarities
Fei Huang | Stephan Vogel | Alex Waibel
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2003

pdf bib
Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization
Fei Huang | Stephan Vogel | Alex Waibel
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition