Haifeng Wang


2020

pdf bib
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
Siqi Bao | Huang He | Fan Wang | Hua Wu | Haifeng Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pre-training models have been proved effective for a wide range of natural language processing tasks. Inspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering. In this framework, we adopt flexible attention mechanisms to fully leverage the bi-directional context and the uni-directional characteristic of language generation. We also introduce discrete latent variables to tackle the inherent one-to-many mapping problem in response generation. Two reciprocal tasks of response generation and latent act recognition are designed and carried out simultaneously within a shared network. Comprehensive experiments on three publicly available datasets verify the effectiveness and superiority of the proposed framework.

pdf bib
Towards Conversational Recommendation over Multi-Type Dialogs
Zeming Liu | Haifeng Wang | Zheng-Yu Niu | Hua Wu | Wanxiang Che | Ting Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We focus on the study of conversational recommendation in the context of multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog (e.g., QA) to a recommendation dialog, taking into account user’s interests and feedback. To facilitate the study of this task, we create a human-to-human Chinese dialog dataset DuRecDial (about 10k dialogs, 156k utterances), where there are multiple sequential dialogs for a pair of a recommendation seeker (user) and a recommender (bot). In each dialog, the recommender proactively leads a multi-type dialog to approach recommendation targets and then makes multiple recommendations with rich interaction behavior. This dataset allows us to systematically investigate different parts of the overall problem, e.g., how to naturally lead a dialog, how to interact with users for recommendation. Finally we establish baseline results on DuRecDial for future studies.

pdf bib
Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
Jun Xu | Haifeng Wang | Zheng-Yu Niu | Hua Wu | Wanxiang Che | Ting Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

To address the challenge of policy learning in open-domain multi-turn conversation, we propose to represent prior information about dialog transitions as a graph and learn a graph grounded dialog policy, aimed at fostering a more coherent and controllable dialog. To this end, we first construct a conversational graph (CG) from dialog corpora, in which there are vertices to represent “what to say” and “how to say”, and edges to represent natural transition between a message (the last utterance in a dialog context) and its response. We then present a novel CG grounded policy learning framework that conducts dialog flow planning by graph traversal, which learns to identify a what-vertex and a how-vertex from the CG at each turn to guide response generation. In this way, we effectively leverage the CG to facilitate policy learning as follows: (1) it enables more effective long-term reward design, (2) it provides high-quality candidate actions, and (3) it gives us more control over the policy. Results on two benchmark corpora demonstrate the effectiveness of this framework.

pdf bib
SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis
Hao Tian | Can Gao | Xinyan Xiao | Hao Liu | Bolei He | Hua Wu | Haifeng Wang | Feng Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.

pdf bib
Leveraging Graph to Improve Abstractive Multi-Document Summarization
Wei Li | Xinyan Xiao | Jiachen Liu | Hua Wu | Haifeng Wang | Junping Du
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as similarity graph and discourse graph, to more effectively process multiple input documents and produce abstractive summaries. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries. Furthermore, pre-trained language models can be easily combined with our model, which further improve the summarization performance significantly. Empirical results on the WikiSum and MultiNews dataset show that the proposed architecture brings substantial improvements over several strong baselines.

pdf bib
Learning Adaptive Segmentation Policy for Simultaneous Translation
Ruiqing Zhang | Chuanqiang Zhang | Zhongjun He | Hua Wu | Haifeng Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Balancing accuracy and latency is a great challenge for simultaneous translation. To achieve high accuracy, the model usually needs to wait for more streaming text before translation, which results in increased latency. However, keeping low latency would probably hurt accuracy. Therefore, it is essential to segment the ASR output into appropriate units for translation. Inspired by human interpreters, we propose a novel adaptive segmentation policy for simultaneous translation. The policy learns to segment the source text by considering possible translations produced by the translation model, maintaining consistency between the segmentation and translation. Experimental results on Chinese-English and German-English translation show that our method achieves a better accuracy-latency trade-off over recently proposed state-of-the-art methods.

pdf bib
DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset
Lijie Wang | Ao Zhang | Kun Wu | Ke Sun | Zhenghua Li | Hua Wu | Min Zhang | Haifeng Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Due to the lack of labeled data, previous research on text-to-SQL parsing mainly focuses on English. Representative English datasets include ATIS, WikiSQL, Spider, etc. This paper presents DuSQL, a larges-scale and pragmatic Chinese dataset for the cross-domain text-to-SQL task, containing 200 databases, 813 tables, and 23,797 question/SQL pairs. Our new dataset has three major characteristics. First, by manually analyzing questions from several representative applications, we try to figure out the true distribution of SQL queries in real-life needs. Second, DuSQL contains a considerable proportion of SQL queries involving row or column calculations, motivated by our analysis on the SQL query distributions. Finally, we adopt an effective data construction framework via human-computer collaboration. The basic idea is automatically generating SQL queries based on the SQL grammar and constrained by the given database. This paper describes in detail the construction process and data statistics of DuSQL. Moreover, we present and compare performance of several open-source text-to-SQL parsers with minor modification to accommodate Chinese, including a simple yet effective extension to IRNet for handling calculation SQL queries.

2019

pdf bib
Multi-agent Learning for Neural Machine Translation
Tianchi Bi | Hao Xiong | Zhongjun He | Hua Wu | Haifeng Wang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Conventional Neural Machine Translation (NMT) models benefit from the training with an additional agent, e.g., dual learning, and bidirectional decoding with one agent decod- ing from left to right and the other decoding in the opposite direction. In this paper, we extend the training framework to the multi-agent sce- nario by introducing diverse agents in an in- teractive updating process. At training time, each agent learns advanced knowledge from others, and they work together to improve translation quality. Experimental results on NIST Chinese-English, IWSLT 2014 German- English, WMT 2014 English-German and large-scale Chinese-English translation tasks indicate that our approach achieves absolute improvements over the strong baseline sys- tems and shows competitive performance on all tasks.

pdf bib
Knowledge Aware Conversation Generation with Explainable Reasoning over Augmented Graphs
Zhibin Liu | Zheng-Yu Niu | Hua Wu | Haifeng Wang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Two types of knowledge, triples from knowledge graphs and texts from documents, have been studied for knowledge aware open domain conversation generation, in which graph paths can narrow down vertex candidates for knowledge selection decision, and texts can provide rich information for response generation. Fusion of a knowledge graph and texts might yield mutually reinforcing advantages, but there is less study on that. To address this challenge, we propose a knowledge aware chatting machine with three components, an augmented knowledge graph with both triples and texts, knowledge selector, and knowledge aware response generator. For knowledge selection on the graph, we formulate it as a problem of multi-hop graph reasoning to effectively capture conversation flow, which is more explainable and flexible in comparison with previous works. To fully leverage long text information that differentiates our graph from others, we improve a state of the art reasoning algorithm with machine reading comprehension technology. We demonstrate the effectiveness of our system on two datasets in comparison with state-of-the-art models.

pdf bib
D-NET: A Pre-Training and Fine-Tuning Framework for Improving the Generalization of Machine Reading Comprehension
Hongyu Li | Xiyuan Zhang | Yibing Liu | Yiming Zhang | Quan Wang | Xiangyang Zhou | Jing Liu | Hua Wu | Haifeng Wang
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

In this paper, we introduce a simple system Baidu submitted for MRQA (Machine Reading for Question Answering) 2019 Shared Task that focused on generalization of machine reading comprehension (MRC) models. Our system is built on a framework of pretraining and fine-tuning, namely D-NET. The techniques of pre-trained language models and multi-task learning are explored to improve the generalization of MRC models and we conduct experiments to examine the effectiveness of these strategies. Our system is ranked at top 1 of all the participants in terms of averaged F1 score. Our codes and models will be released at PaddleNLP.

pdf bib
STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework
Mingbo Ma | Liang Huang | Hao Xiong | Renjie Zheng | Kaibo Liu | Baigong Zheng | Chuanqiang Zhang | Zhongjun He | Hairong Liu | Xing Li | Hua Wu | Haifeng Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Simultaneous translation, which translates sentences before they are finished, is use- ful in many scenarios but is notoriously dif- ficult due to word-order differences. While the conventional seq-to-seq framework is only suitable for full-sentence translation, we pro- pose a novel prefix-to-prefix framework for si- multaneous translation that implicitly learns to anticipate in a single translation model. Within this framework, we present a very sim- ple yet surprisingly effective “wait-k” policy trained to generate the target sentence concur- rently with the source sentence, but always k words behind. Experiments show our strat- egy achieves low latency and reasonable qual- ity (compared to full-sentence translation) on 4 directions: zh↔en and de↔en.

pdf bib
Proactive Human-Machine Conversation with Explicit Conversation Goal
Wenquan Wu | Zhen Guo | Xiangyang Zhou | Hua Wu | Xiyuan Zhang | Rongzhong Lian | Haifeng Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic). To facilitate the development of such conversation systems, we create a new dataset named Konv where one acts as a conversation leader and the other acts as the follower. The leader is provided with a knowledge graph and asked to sequentially change the discussion topics, following the given conversation goal, and meanwhile keep the dialogue as natural and engaging as possible. Konv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph. We establish baseline results on this dataset (about 270K utterances and 30k dialogues) using several state-of-the-art models. Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. The baseline systems along with the dataset are publicly available.

pdf bib
Baidu Neural Machine Translation Systems for WMT19
Meng Sun | Bojian Jiang | Hao Xiong | Zhongjun He | Hua Wu | Haifeng Wang
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this paper we introduce the systems Baidu submitted for the WMT19 shared task on Chinese<->English news translation. Our systems are based on the Transformer architecture with some effective improvements. Data selection, back translation, data augmentation, knowledge distillation, domain adaptation, model ensemble and re-ranking are employed and proven effective in our experiments. Our Chinese->English system achieved the highest case-sensitive BLEU score among all constrained submissions, and our English->Chinese system ranked the second in all submissions.

2018

pdf bib
Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification
Yizhong Wang | Kai Liu | Jing Liu | Wei He | Yajuan Lyu | Hua Wu | Sujian Li | Haifeng Wang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Machine reading comprehension (MRC) on real web data usually requires the machine to answer a question by analyzing multiple passages retrieved by search engine. Compared with MRC on a single passage, multi-passage MRC is more challenging, since we are likely to get multiple confusing answer candidates from different passages. To address this problem, we propose an end-to-end neural model that enables those answer candidates from different passages to verify each other based on their content representations. Specifically, we jointly train three modules that can predict the final answer based on three factors: the answer boundary, the answer content and the cross-passage answer verification. The experimental results show that our method outperforms the baseline by a large margin and achieves the state-of-the-art performance on the English MS-MARCO dataset and the Chinese DuReader dataset, both of which are designed for MRC in real-world settings.

pdf bib
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
Wei He | Kai Liu | Jing Liu | Yajuan Lyu | Shiqi Zhao | Xinyan Xiao | Yuan Liu | Yizhong Wang | Hua Wu | Qiaoqiao She | Xuan Liu | Tian Wu | Haifeng Wang
Proceedings of the Workshop on Machine Reading for Question Answering

This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader and baseline systems have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.

2017

pdf bib
Multi-task Attention-based Neural Networks for Implicit Discourse Relationship Representation and Identification
Man Lan | Jianxiang Wang | Yuanbin Wu | Zheng-Yu Niu | Haifeng Wang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a novel multi-task attention based neural network model to address implicit discourse relationship representation and identification through two types of representation learning, an attention based neural network for learning discourse relationship representation with two arguments and a multi-task framework for learning knowledge from annotated and unannotated corpora. The extensive experiments have been performed on two benchmark corpora (i.e., PDTB and CoNLL-2016 datasets). Experimental results show that our proposed model outperforms the state-of-the-art systems on benchmark corpora.

2016

pdf bib
A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks
Jiang Guo | Wanxiang Che | Haifeng Wang | Ting Liu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Various treebanks have been released for dependency parsing. Despite that treebanks may belong to different languages or have different annotation schemes, they contain common syntactic knowledge that is potential to benefit each other. This paper presents a universal framework for transfer parsing across multi-typed treebanks with deep multi-task learning. We consider two kinds of treebanks as source: the multilingual universal treebanks and the monolingual heterogeneous treebanks. Knowledge across the source and target treebanks are effectively transferred through multi-level parameter sharing. Experiments on several benchmark datasets in various languages demonstrate that our approach can make effective use of arbitrary source treebanks to improve target parsing models.

pdf bib
Chinese Poetry Generation with Planning based Neural Network
Zhe Wang | Wei He | Hua Wu | Haiyang Wu | Wei Li | Haifeng Wang | Enhong Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user’s writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user’s intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.

pdf bib
A Unified Architecture for Semantic Role Labeling and Relation Classification
Jiang Guo | Wanxiang Che | Haifeng Wang | Ting Liu | Jun Xu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper describes a unified neural architecture for identifying and classifying multi-typed semantic relations between words in a sentence. We investigate two typical and well-studied tasks: semantic role labeling (SRL) which identifies the relations between predicates and arguments, and relation classification (RC) which focuses on the relation between two entities or nominals. While mostly studied separately in prior work, we show that the two tasks can be effectively connected and modeled using a general architecture. Experiments on CoNLL-2009 benchmark datasets show that our SRL models significantly outperform state-of-the-art approaches. Our RC models also yield competitive performance with the best published records. Furthermore, we show that the two tasks can be trained jointly with multi-task learning, resulting in additive significant improvements for SRL.

pdf bib
Active Learning for Dependency Parsing with Partial Annotation
Zhenghua Li | Min Zhang | Yue Zhang | Zhanyi Liu | Wenliang Chen | Hua Wu | Haifeng Wang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Cross-lingual Dependency Parsing Based on Distributed Representations
Jiang Guo | Wanxiang Che | David Yarowsky | Haifeng Wang | Ting Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Multi-Task Learning for Multiple Language Translation
Daxiang Dong | Hua Wu | Wei He | Dianhai Yu | Haifeng Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Learning Semantic Hierarchies via Word Embeddings
Ruiji Fu | Jiang Guo | Bing Qin | Wanxiang Che | Haifeng Wang | Ting Liu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources
Jiang Guo | Wanxiang Che | Haifeng Wang | Ting Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Policy Learning for Domain Selection in an Extensible Multi-domain Spoken Dialogue System
Zhuoran Wang | Hongliang Chen | Guanchun Wang | Hao Tian | Hua Wu | Haifeng Wang
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Revisiting Embedding Features for Simple Semi-supervised Learning
Jiang Guo | Wanxiang Che | Haifeng Wang | Ting Liu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model
Haiyang Wu | Daxiang Dong | Xiaoguang Hu | Dianhai Yu | Wei He | Hua Wu | Haifeng Wang | Ting Liu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Transformation from Discontinuous to Continuous Word Alignment Improves Translation Quality
Zhongjun He | Hua Wu | Haifeng Wang | Ting Liu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Improving Pivot-Based Statistical Machine Translation by Pivoting the Co-occurrence Count of Phrase Pairs
Xiaoning Zhu | Zhongjun He | Hua Wu | Conghui Zhu | Haifeng Wang | Tiejun Zhao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Improving Pivot-Based Statistical Machine Translation Using Random Walk
Xiaoning Zhu | Zhongjun He | Hua Wu | Haifeng Wang | Conghui Zhu | Tiejun Zhao
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bootstrapping Large-scale Named Entities using URL-Text Hybrid Patterns
Chao Zhang | Shiqi Zhao | Haifeng Wang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
A Hierarchical Semantics-Aware Distributional Similarity Scheme
Shuqi Sun | Ke Sun | Shiqi Zhao | Haifeng Wang | Muyun Yang | Sheng Li
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Generalization of Words for Chinese Dependency Parsing
Xianchao Wu | Jie Zhou | Yu Sun | Zhanyi Liu | Dianhai Yu | Hua Wu | Haifeng Wang
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

2012

pdf bib
Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
Jinsong Su | Hua Wu | Haifeng Wang | Yidong Chen | Xiaodong Shi | Huailin Dong | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improve SMT Quality with Automatically Extracted Paraphrase Rules
Wei He | Hua Wu | Haifeng Wang | Ting Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
User Behaviors Lend a Helping Hand: Learning Paraphrase Query Patterns from Search Log Sessions
Shiqi Zhao | Haifeng Wang | Ting Liu
Proceedings of COLING 2012

2011

pdf bib
Web-based Machine Translation
Haifeng Wang
Proceedings of the Fifth International Workshop On Cross Lingual Information Access

pdf bib
Proceedings of 5th International Joint Conference on Natural Language Processing
Haifeng Wang | David Yarowsky
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Enriching SMT Training Data via Paraphrasing
Wei He | Shiqi Zhao | Haifeng Wang | Ting Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Automatically Generating Questions from Queries for Community-based Question Answering
Shiqi Zhao | Haifeng Wang | Chao Li | Ting Liu | Yi Guan
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Harvesting Related Entities with a Search Engine
Shuqi Sun | Shiqi Zhao | Muyun Yang | Haifeng Wang | Sheng Li
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Reordering with Source Language Collocations
Zhanyi Liu | Haifeng Wang | Hua Wu | Ting Liu | Sheng Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Paraphrasing with Search Engine Query Logs
Shiqi Zhao | Haifeng Wang | Ting Liu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Leveraging Multiple MT Engines for Paraphrase Generation
Shiqi Zhao | Haifeng Wang | Xiang Lan | Ting Liu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Coling 2010: Paraphrases and Applications–Tutorial notes
Shiqi Zhao | Haifeng Wang
Coling 2010: Paraphrases and Applications–Tutorial notes

pdf bib
Paraphrases and Applications
Shiqi Zhao | Haifeng Wang
Coling 2010: Paraphrases and Applications–Tutorial notes

pdf bib
Improving Statistical Machine Translation with Monolingual Collocation
Zhanyi Liu | Haifeng Wang | Hua Wu | Sheng Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Lluís Màrquez | Haifeng Wang
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2009

pdf bib
Collocation Extraction Using Monolingual Word Alignment Method
Zhanyi Liu | Haifeng Wang | Hua Wu | Sheng Li
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Exploiting Heterogeneous Treebanks for Parsing
Zheng-Yu Niu | Haifeng Wang | Hua Wu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Revisiting Pivot Language Approach for Machine Translation
Hua Wu | Haifeng Wang
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Dependency Based Chinese Sentence Realization
Wei He | Haifeng Wang | Yuqing Guo | Ting Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora
Shiqi Zhao | Haifeng Wang | Ting Liu | Sheng Li
Proceedings of ACL-08: HLT

pdf bib
Accurate and Robust LFG-Based Generation for Chinese
Yuqing Guo | Haifeng Wang | Josef van Genabith
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
Dependency-Based N-Gram Models for General Purpose Sentence Realisation
Yuqing Guo | Josef van Genabith | Haifeng Wang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Prediction of Maximal Projection for Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Haifeng Wang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora
Hua Wu | Haifeng Wang | Chengqing Zong
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Pivot Language Approach for Phrase-Based Statistical Machine Translation
Hua Wu | Haifeng Wang
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Recovering Non-Local Dependencies for Chinese
Yuqing Guo | Haifeng Wang | Josef van Genabith
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Using RBMT Systems to Produce Bilingual Corpus for SMT
Xiaoguang Hu | Haifeng Wang | Hua Wu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation
Zhimao Lu | Haifeng Wang | Jianmin Yao | Ting Liu | Sheng Li
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval
Jiang Zhu | Haifeng Wang
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Discriminative Pruning of Language Models for Chinese Word Segmentation
Jianfeng Li | Haifeng Wang | Dengjun Ren | Guohua Li
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs
Haifeng Wang | Hua Wu | Zhanyi Liu
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Boosting Statistical Word Alignment Using Labeled and Unlabeled Data
Hua Wu | Haifeng Wang | Zhanyi Liu
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Improving Statistical Word Alignment with Ensemble Methods
Hua Wu | Haifeng Wang
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Alignment Model Adaptation for Domain-Specific Word Alignment
Hua Wu | Haifeng Wang | Zhanyi Liu
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Improving Domain-Specific Word Alignment for Computer Assisted Translation
Hua Wu | Haifeng Wang
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Improving Statistical Word Alignment with a Rule-Based Machine Translation System
Hua Wu | Haifeng Wang
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics