Hai Zhao


2020

pdf bib
Bipartite Flat-Graph Network for Nested Named Entity Recognition
Ying Luo | Hai Zhao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for nested named entity recognition (NER), which contains two subgraph modules: a flat NER module for outermost entities and a graph module for all the entities located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional network (GCN) are adopted to jointly learn flat entities and their inner dependencies. Different from previous models, which only consider the unidirectional delivery of information from innermost layers to outer ones (or outside-to-inside), our model effectively captures the bidirectional interaction between them. We first use the entities recognized by the flat NER module to construct an entity graph, which is fed to the next graph module. The richer representation learned from graph module carries the dependencies of inner entities and can be exploited to improve outermost entity predictions. Experimental results on three standard nested NER datasets demonstrate that our BiFlaG outperforms previous state-of-the-art models.

pdf bib
High-order Semantic Role Labeling
Zuchao Li | Hai Zhao | Rui Wang | Kevin Parnow
Findings of the Association for Computational Linguistics: EMNLP 2020

Semantic role labeling is primarily used to identify predicates, arguments, and their semantic relationships. Due to the limitations of modeling methods and the conditions of pre-identified predicates, previous work has focused on the relationships between predicates and arguments and the correlations between arguments at most, while the correlations between predicates have been neglected for a long time. High-order features and structure learning were very common in modeling such correlations before the neural network era. In this paper, we introduce a high-order graph structure for the neural semantic role labeling model, which enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models and further boost our baseline to achieve new state-of-the-art results.

pdf bib
Reference Language based Unsupervised Neural Machine Translation
Zuchao Li | Hai Zhao | Rui Wang | Masao Utiyama | Eiichiro Sumita
Findings of the Association for Computational Linguistics: EMNLP 2020

Exploiting a common language as an auxiliary for better translation has a long tradition in machine translation and lets supervised learning-based machine translation enjoy the enhancement delivered by the well-used pivot language in the absence of a source language to target language parallel corpus. The rise of unsupervised neural machine translation (UNMT) almost completely relieves the parallel corpus curse, though UNMT is still subject to unsatisfactory performance due to the vagueness of the clues available for its core back-translation training. Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism. Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language, demonstrating the usefulness of the proposed reference language-based UNMT and establishing a good start for the community.

pdf bib
Parsing All: Syntax and Semantics, Dependencies and Spans
Junru Zhou | Zuchao Li | Hai Zhao
Findings of the Association for Computational Linguistics: EMNLP 2020

Both syntactic and semantic structures are key linguistic contextual clues, in which parsing the latter has been well shown beneficial from parsing the former. However, few works ever made an attempt to let semantic parsing help syntactic parsing. As linguistic representation formalisms, both syntax and semantics may be represented in either span (constituent/phrase) or dependency, on both of which joint learning was also seldom explored. In this paper, we propose a novel joint model of syntactic and semantic parsing on both span and dependency representations, which incorporates syntactic information effectively in the encoder of neural network and benefits from two representation formalisms in a uniform way. The experiments show that semantics and syntax can benefit each other by optimizing joint objectives. Our single model achieves new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank.

pdf bib
LIMIT-BERT : Linguistics Informed Multi-Task BERT
Junru Zhou | Zhuosheng Zhang | Hai Zhao | Shuailiang Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we present Linguistics Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistics tasks by Multi-Task Learning. LIMIT-BERT includes five key linguistics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Different from recent Multi-Task Deep Neural Networks (MT-DNN), our LIMIT-BERT is fully linguistics motivated and thus is capable of adopting an improved masked training objective according to syntactic and semantic constituents. Besides, LIMIT-BERT takes a semi-supervised learning strategy to offer the same large amount of linguistics task data as that for the language model training. As a result, LIMIT-BERT not only improves linguistics tasks performance but also benefits from a regularization effect and linguistics information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT outperforms the strong baseline Whole Word Masking BERT on both dependency and constituent syntactic/semantic parsing, GLUE benchmark, and SNLI task. Our practice on the proposed LIMIT-BERT also enables us to release a well pre-trained model for multi-purpose of natural language processing tasks once for all.

pdf bib
Attention Is All You Need for Chinese Word Segmentation
Sufeng Duan | Hai Zhao
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Taking greedy decoding algorithm as it should be, this work focuses on further strengthening the model itself for Chinese word segmentation (CWS), which results in an even more fast and more accurate CWS model. Our model consists of an attention only stacked encoder and a light enough decoder for the greedy segmentation plus two highway connections for smoother training, in which the encoder is composed of a newly proposed Transformer variant, Gaussian-masked Directional (GD) Transformer, and a biaffine attention scorer. With the effective encoder design, our model only needs to take unigram features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark datasets. The experimental results show that with the highest segmentation speed, the proposed model achieves new state-of-the-art or comparable performance against strong baselines in terms of strict closed test setting.

pdf bib
Named Entity Recognition Only from Word Embeddings
Ying Luo | Hai Zhao | Junlang Zhan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Deep neural network models have helped named entity recognition achieve amazing performance without handcrafting features. However, existing systems require large amounts of human annotated training data. Efforts have been made to replace human annotations with external knowledge (e.g., NE dictionary, part-of-speech tags), while it is another challenge to obtain such effective resources. In this work, we propose a fully unsupervised NE recognition model which only needs to take informative clues from pre-trained word embeddings.We first apply Gaussian Hidden Markov Model and Deep Autoencoding Gaussian Mixture Model on word embeddings for entity span detection and type prediction, and then further design an instance selector based on reinforcement learning to distinguish positive sentences from noisy sentences and then refine these coarse-grained annotations through neural networks. Extensive experiments on two CoNLL benchmark NER datasets (CoNLL-2003 English dataset and CoNLL-2002 Spanish dataset) demonstrate that our proposed light NE recognition model achieves remarkable performance without using any annotated lexicon or corpus.

2019

pdf bib
Syntax-aware Multilingual Semantic Role Labeling
Shexia He | Zuchao Li | Hai Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recently, semantic role labeling (SRL) has earned a series of success with even higher performance improvements, which can be mainly attributed to syntactic integration and enhanced word representation. However, most of these efforts focus on English, while SRL on multiple languages more than English has received relatively little attention so that is kept underdevelopment. Thus this paper intends to fill the gap on multilingual SRL with special focus on the impact of syntax and contextualized word representation. Unlike existing work, we propose a novel method guided by syntactic rule to prune arguments, which enables us to integrate syntax into multilingual SRL model simply and effectively. We present a unified SRL model designed for multiple languages together with the proposed uniform syntax enhancement. Our model achieves new state-of-the-art results on the CoNLL-2009 benchmarks of all seven languages. Besides, we pose a discussion on the syntactic role among different languages and verify the effectiveness of deep enhanced representation for multilingual SRL.

pdf bib
SJTU-NICT at MRP 2019: Multi-Task Learning for End-to-End Uniform Semantic Graph Parsing
Zuchao Li | Hai Zhao | Zhuosheng Zhang | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes our SJTU-NICT’s system for participating in the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL). Our system uses a graph-based approach to model a variety of semantic graph parsing tasks. Our main contributions in the submitted system are summarized as follows: 1. Our model is fully end-to-end and is capable of being trained only on the given training set which does not rely on any other extra training source including the companion data provided by the organizer; 2. We extend our graph pruning algorithm to a variety of semantic graphs, solving the problem of excessive semantic graph search space; 3. We introduce multi-task learning for multiple objectives within the same framework. The evaluation results show that our system achieved second place in the overall F1 score and achieved the best F1 score on the DM framework.

pdf bib
SJTU at MRP 2019: A Transition-Based Multi-Task Parser for Cross-Framework Meaning Representation Parsing
Hongxiao Bai | Hai Zhao
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes the system of our team SJTU for our participation in the CoNLL 2019 Shared Task: Cross-Framework Meaning Representation Parsing. The goal of the task is to advance data-driven parsing into graph-structured representations of sentence meaning. This task includes five meaning representation frameworks: DM, PSD, EDS, UCCA, and AMR. These frameworks have different properties and structures. To tackle all the frameworks in one model, it is needed to find out the commonality of them. In our work, we define a set of the transition actions to once-for-all tackle all the frameworks and train a transition-based model to parse the meaning representation. The adopted multi-task model also can allow learning for one framework to benefit the others. In the final official evaluation of the shared task, our system achieves 42% F1 unified MRP metric score.

pdf bib
Open Vocabulary Learning for Neural Chinese Pinyin IME
Zhuosheng Zhang | Yafang Huang | Hai Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by an online updated vocabulary with a sampling mechanism to support open vocabulary learning during IME working. Our experiments show that the proposed method outperforms commercial IMEs and state-of-the-art traditional models on standard corpus and true inputting history dataset in terms of multiple metrics and thus the online updated vocabulary indeed helps our IME effectively follows user inputting behavior.

pdf bib
Head-Driven Phrase Structure Grammar Parsing on Penn Treebank
Junru Zhou | Hai Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Head-driven phrase structure grammar (HPSG) enjoys a uniform formalism representing rich contextual syntactic and even semantic meanings. This paper makes the first attempt to formulate a simplified HPSG by integrating constituent and dependency formal representations into head-driven phrase structure. Then two parsing algorithms are respectively proposed for two converted tree representations, division span and joint span. As HPSG encodes both constituent and dependency structure information, the proposed HPSG parsers may be regarded as a sort of joint decoder for both types of structures and thus are evaluated in terms of extracted or converted constituent and dependency parsing trees. Our parser achieves new state-of-the-art performance for both parsing tasks on Penn Treebank (PTB) and Chinese Penn Treebank, verifying the effectiveness of joint learning constituent and dependency structures. In details, we report 95.84 F1 of constituent parsing and 97.00% UAS of dependency parsing on PTB.

pdf bib
Lattice-Based Transformer Encoder for Neural Machine Translation
Fengshun Xiao | Jiangtong Li | Hai Zhao | Rui Wang | Kehai Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural machine translation (NMT) takes deterministic sequences for source representations. However, either word-level or subword-level segmentations have multiple choices to split a source sequence with different word segmentors or different subword vocabulary sizes. We hypothesize that the diversity in segmentations may affect the NMT performance. To integrate different segmentations with the state-of-the-art NMT model, Transformer, we propose lattice-based encoders to explore effective word or subword representation in an automatic way during training. We propose two methods: 1) lattice positional encoding and 2) lattice-aware self-attention. These two methods can be used together and show complementary to each other to further improve translation performance. Experiment results show superiorities of lattice-based encoders in word-level and subword-level representations over conventional Transformer encoder.

pdf bib
GAN Driven Semi-distant Supervision for Relation Extraction
Pengshuai Li | Xinsong Zhang | Weijia Jia | Hai Zhao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Distant supervision has been widely used in relation extraction tasks without hand-labeled datasets recently. However, the automatically constructed datasets comprise numbers of wrongly labeled negative instances due to the incompleteness of knowledge bases, which is neglected by current distant supervised methods resulting in seriously misleading in both training and testing processes. To address this issue, we propose a novel semi-distant supervision approach for relation extraction by constructing a small accurate dataset and properly leveraging numerous instances without relation labels. In our approach, we construct accurate instances by both knowledge base and entity descriptions determined to avoid wrong negative labeling and further utilize unlabeled instances sufficiently using generative adversarial network (GAN) framework. Experimental results on real-world datasets show that our approach can achieve significant improvements in distant supervised relation extraction over strong baselines.

pdf bib
Semantic Role Labeling with Associated Memory Network
Chaoyu Guan | Yuhao Cheng | Hai Zhao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Semantic role labeling (SRL) is a task to recognize all the predicate-argument pairs of a sentence, which has been in a performance improvement bottleneck after a series of latest works were presented. This paper proposes a novel syntax-agnostic SRL model enhanced by the proposed associated memory network (AMN), which makes use of inter-sentence attention of label-known associated sentences as a kind of memory to further enhance dependency-based SRL. In detail, we use sentences and their labels from train dataset as an associated memory cue to help label the target sentence. Furthermore, we compare several associated sentences selecting strategies and label merging methods in AMN to find and utilize the label of associated sentences while attending them. By leveraging the attentive memory from known training data, Our full model reaches state-of-the-art on CoNLL-2009 benchmark datasets for syntax-agnostic setting, showing a new effective research line of SRL enhancement other than exploiting external resources such as well pre-trained language models.

2018

pdf bib
SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings
Zhuosheng Zhang | Jiangtong Li | Hai Zhao | Bingjie Tang
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.

pdf bib
Joint Learning of POS and Dependencies for Multilingual Universal Dependency Parsing
Zuchao Li | Shexia He | Zhuosheng Zhang | Hai Zhao
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the system of team LeisureX in the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Our system predicts the part-of-speech tag and dependency tree jointly. For the basic tasks, including tokenization, lemmatization and morphology prediction, we employ the official baseline model (UDPipe). To train the low-resource languages, we adopt a sampling method based on other richresource languages. Our system achieves a macro-average of 68.31% LAS F1 score, with an improvement of 2.51% compared with the UDPipe.

pdf bib
Multilingual Universal Dependency Parsing from Raw Text with Low-Resource Language Enhancement
Yingting Wu | Hai Zhao | Jia-Jun Tong
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the system of our team Phoenix for participating CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Given the annotated gold standard data in CoNLL-U format, we train the tokenizer, tagger and parser separately for each treebank based on an open source pipeline tool UDPipe. Our system reads the plain texts for input, performs the pre-processing steps (tokenization, lemmas, morphology) and finally outputs the syntactic dependencies. For the low-resource languages with no training data, we use cross-lingual techniques to build models with some close languages instead. In the official evaluation, our system achieves the macro-averaged scores of 65.61%, 52.26%, 55.71% for LAS, MLAS and BLEX respectively.

pdf bib
A Unified Syntax-aware Framework for Semantic Role Labeling
Zuchao Li | Shexia He | Jiaxun Cai | Zhuosheng Zhang | Hai Zhao | Gongshen Liu | Linlin Li | Luo Si
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Semantic role labeling (SRL) aims to recognize the predicate-argument structure of a sentence. Syntactic information has been paid a great attention over the role of enhancing SRL. However, the latest advance shows that syntax would not be so important for SRL with the emerging much smaller gap between syntax-aware and syntax-agnostic SRL. To comprehensively explore the role of syntax for SRL task, we extend existing models and propose a unified framework to investigate more effective and more diverse ways of incorporating syntax into sequential neural networks. Exploring the effect of syntactic input quality on SRL performance, we confirm that high-quality syntactic parse could still effectively enhance syntactically-driven SRL. Using empirically optimized integration strategy, we even enlarge the gap between syntax-aware and syntax-agnostic SRL. Our framework achieves state-of-the-art results on CoNLL-2009 benchmarks both for English and Chinese, substantially outperforming all previous models.

pdf bib
Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
Yafang Huang | Hai Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Chinese pinyin input method engine (IME) converts pinyin into character so that Chinese characters can be conveniently inputted into computer through common keyboard. IMEs work relying on its core component, pinyin-to-character conversion (P2C). Usually Chinese IMEs simply predict a list of character sequences for user choice only according to user pinyin input at each turn. However, Chinese inputting is a multi-turn online procedure, which can be supposed to be exploited for further user experience promoting. This paper thus for the first time introduces a sequence-to-sequence model with gated-attention mechanism for the core task in IMEs. The proposed neural P2C model is learned by encoding previous input utterance as extra context to enable our IME capable of predicting character sequence with incomplete pinyin input. Our model is evaluated in different benchmark datasets showing great user experience improvement compared to traditional models, which demonstrates the first engineering practice of building Chinese aided IME.

pdf bib
Exploring Recombination for Efficient Decoding of Neural Machine Translation
Zhisong Zhang | Rui Wang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations. This means that partial hypotheses with different prefixes will be regarded differently no matter how similar they are. However, this might be inefficient since some partial hypotheses can contain only local differences that will not influence future predictions. In this work, we introduce recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses. Heuristically, we use a simple n-gram suffix based equivalence function and adapt it into beam search decoding. Through experiments on large-scale Chinese-to-English and English-to-Germen translation tasks, we show that the proposed method can obtain similar translation quality with a smaller beam size, making NMT decoding more efficient.

pdf bib
One-shot Learning for Question-Answering in Gaokao History Challenge
Zhuosheng Zhang | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Answering questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between questions and answers. In this work, we propose a hybrid neural model for deep question-answering task from history examinations. Our model employs a cooperative gated neural network to retrieve answers with the assistance of extra labels given by a neural turing machine labeler. Empirical study shows that the labeler works well with only a small training dataset and the gated mechanism is good at fetching the semantic representation of lengthy answers. Experiments on question answering demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of multiple evaluation metrics.

pdf bib
Deep Enhanced Representation for Implicit Discourse Relation Recognition
Hongxiao Bai | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Implicit discourse relation recognition is a challenging task as the relation prediction without explicit connectives in discourse parsing needs understanding of text spans and cannot be easily derived from surface features from the input sentence pairs. Thus, properly representing the text is very crucial to this task. In this paper, we propose a model augmented with different grained text representations, including character, subword, word, sentence, and sentence pair levels. The proposed deeper model is evaluated on the benchmark treebank and achieves state-of-the-art accuracy with greater than 48% in 11-way and F1 score greater than 50% in 4-way classifications for the first time according to our best knowledge.

pdf bib
Subword-augmented Embedding for Cloze Reading Comprehension
Zhuosheng Zhang | Yafang Huang | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Representation learning is the foundation of machine reading comprehension. In state-of-the-art models, deep learning methods broadly use word and character level representations. However, character is not naturally the minimal linguistic unit. In addition, with a simple concatenation of character and word embedding, previous models actually give suboptimal solution. In this paper, we propose to use subword rather than character for word embedding enhancement. We also empirically explore different augmentation strategies on subword-augmented embedding to enhance the cloze-style reading comprehension model (reader). In detail, we present a reader that uses subword-level representation to augment word embedding with a short list to handle rare words effectively. A thorough examination is conducted to evaluate the comprehensive performance and generalization ability of the proposed reader. Experimental results show that the proposed approach helps the reader significantly outperform the state-of-the-art baselines on various public datasets.

pdf bib
A Full End-to-End Semantic Role Labeler, Syntactic-agnostic Over Syntactic-aware?
Jiaxun Cai | Shexia He | Zuchao Li | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Semantic role labeling (SRL) is to recognize the predicate-argument structure of a sentence, including subtasks of predicate disambiguation and argument labeling. Previous studies usually formulate the entire SRL problem into two or more subtasks. For the first time, this paper introduces an end-to-end neural model which unifiedly tackles the predicate disambiguation and the argument labeling in one shot. Using a biaffine scorer, our model directly predicts all semantic role labels for all given word pairs in the sentence without relying on any syntactic parse information. Specifically, we augment the BiLSTM encoder with a non-linear transformation to further distinguish the predicate and the argument in a given sentence, and model the semantic role labeling process as a word pair classification task by employing the biaffine attentional mechanism. Though the proposed model is syntax-agnostic with local decoder, it outperforms the state-of-the-art syntax-aware SRL systems on the CoNLL-2008, 2009 benchmarks for both English and Chinese. To our best knowledge, we report the first syntax-agnostic SRL model that surpasses all known syntax-aware models.

pdf bib
Seq2seq Dependency Parsing
Zuchao Li | Jiaxun Cai | Shexia He | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics

This paper presents a sequence to sequence (seq2seq) dependency parser by directly predicting the relative position of head for each given word, which therefore results in a truly end-to-end seq2seq dependency parser for the first time. Enjoying the advantage of seq2seq modeling, we enrich a series of embedding enhancement, including firstly introduced subword and node2vec augmentation. Meanwhile, we propose a beam search decoder with tree constraint and subroot decomposition over the sequence to furthermore enhance our seq2seq parser. Our parser is evaluated on benchmark treebanks, being on par with the state-of-the-art parsers by achieving 94.11% UAS on PTB and 88.78% UAS on CTB, respectively.

pdf bib
Modeling Multi-turn Conversation with Deep Utterance Aggregation
Zhuosheng Zhang | Jiangtong Li | Pengfei Zhu | Hai Zhao | Gongshen Liu
Proceedings of the 27th International Conference on Computational Linguistics

Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

pdf bib
Lingke: a Fine-grained Multi-turn Chatbot for Customer Service
Pengfei Zhu | Zhuosheng Zhang | Jiangtong Li | Yafang Huang | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

pdf bib
Syntax for Semantic Role Labeling, To Be, Or Not To Be
Shexia He | Zuchao Li | Hai Zhao | Hongxiao Bai
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Semantic role labeling (SRL) is dedicated to recognizing the predicate-argument structure of a sentence. Previous studies have shown syntactic information has a remarkable contribution to SRL performance. However, such perception was challenged by a few recent neural SRL models which give impressive performance without a syntactic backbone. This paper intends to quantify the importance of syntactic information to dependency SRL in deep learning framework. We propose an enhanced argument labeling model companying with an extended korder argument pruning algorithm for effectively exploiting syntactic information. Our model achieves state-of-the-art results on the CoNLL-2008, 2009 benchmarks for both English and Chinese, showing the quantitative significance of syntax to neural SRL together with a thorough empirical survey over existing models.

pdf bib
Automatic Article Commenting: the Task and Dataset
Lianhui Qin | Lemao Liu | Wei Bi | Yan Wang | Xiaojiang Liu | Zhiting Hu | Hai Zhao | Shuming Shi
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Comments of online articles provide extended views and improve user engagement. Automatically making comments thus become a valuable functionality for online forums, intelligent chatbots, etc. This paper proposes the new task of automatic article commenting, and introduces a large-scale Chinese dataset with millions of real comments and a human-annotated subset characterizing the comments’ varying quality. Incorporating the human bias of comment quality, we further develop automatic metrics that generalize a broad set of popular reference-based metrics and exhibit greatly improved correlations with human evaluations.

pdf bib
Moon IME: Neural-based Chinese Pinyin Aided Input Method with Customizable Association
Yafang Huang | Zuchao Li | Zhuosheng Zhang | Hai Zhao
Proceedings of ACL 2018, System Demonstrations

Chinese pinyin input method engine (IME) lets user conveniently input Chinese into a computer by typing pinyin through the common keyboard. In addition to offering high conversion quality, modern pinyin IME is supposed to aid user input with extended association function. However, existing solutions for such functions are roughly based on oversimplified matching algorithms at word-level, whose resulting products provide limited extension associated with user inputs. This work presents the Moon IME, a pinyin IME that integrates the attention-based neural machine translation (NMT) model and Information Retrieval (IR) to offer amusive and customizable association ability. The released IME is implemented on Windows via text services framework.

2017

pdf bib
Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification
Lianhui Qin | Zhisong Zhang | Hai Zhao | Zhiting Hu | Eric Xing
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit discourse relation classification is of great challenge due to the lack of connectives as strong linguistic cues, which motivates the use of annotated implicit connectives to improve the recognition. We propose a feature imitation framework in which an implicit relation network is driven to learn from another neural network with access to connectives, and thus encouraged to extract similarly salient features for accurate classification. We develop an adversarial model to enable an adaptive imitation scheme through competition between the implicit network and a rival feature discriminator. Our method effectively transfers discriminability of connectives to the implicit features, and achieves state-of-the-art performance on the PDTB benchmark.

pdf bib
Fast and Accurate Neural Word Segmentation for Chinese
Deng Cai | Hai Zhao | Zhisong Zhang | Yuan Xin | Yongjian Wu | Feiyue Huang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. In this paper, we propose a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.

pdf bib
A Transition-based System for Universal Dependency Parsing
Hao Wang | Hai Zhao | Zhisong Zhang
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the system for our participation in the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In this work, we design a system based on UDPipe1 for universal dependency parsing, where multilingual transition-based models are trained for different treebanks. Our system directly takes raw texts as input, performing several intermediate steps like tokenizing and tagging, and finally generates the corresponding dependency trees. For the special surprise languages for this task, we adopt a delexicalized strategy and predict basing on transfer learning from other related languages. In the final evaluation of the shared task, our system achieves a result of 66.53% in macro-averaged LAS F1-score.

2016

pdf bib
A Stacking Gated Neural Architecture for Implicit Discourse Relation Classification
Lianhui Qin | Zhisong Zhang | Hai Zhao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network
Peilu Wang | Yao Qian | Frank K. Soong | Lei He | Hai Zhao
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Constituent Syntactic Parse Tree Based Discourse Parser
Zhongyi Li | Hai Zhao | Chenxi Pang | Lili Wang | Huan Wang
Proceedings of the CoNLL-16 shared task

pdf bib
Shallow Discourse Parsing Using Convolutional Neural Network
Lianhui Qin | Zhisong Zhang | Hai Zhao
Proceedings of the CoNLL-16 shared task

pdf bib
Implicit Discourse Relation Recognition with Context-aware Character-enhanced Embeddings
Lianhui Qin | Zhisong Zhang | Hai Zhao
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

For the task of implicit discourse relation recognition, traditional models utilizing manual features can suffer from data sparsity problem. Neural models provide a solution with distributed representations, which could encode the latent semantic information, and are suitable for recognizing semantic relations between argument pairs. However, conventional vector representations usually adopt embeddings at the word level and cannot well handle the rare word problem without carefully considering morphological information at character level. Moreover, embeddings are assigned to individual words independently, which lacks of the crucial contextual information. This paper proposes a neural model utilizing context-aware character-enhanced embeddings to alleviate the drawbacks of the current word level representation. Our experiments show that the enhanced embeddings work well and the proposed model obtains state-of-the-art results.

pdf bib
Connecting Phrase based Statistical Machine Translation Adaptation
Rui Wang | Hai Zhao | Bao-Liang Lu | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Although more additional corpora are now available for Statistical Machine Translation (SMT), only the ones which belong to the same or similar domains of the original corpus can indeed enhance SMT performance directly. A series of SMT adaptation methods have been proposed to select these similar-domain data, and most of them focus on sentence selection. In comparison, phrase is a smaller and more fine grained unit for data selection, therefore we propose a straightforward and efficient connecting phrase based adaptation method, which is applied to both bilingual phrase pair and monolingual n-gram adaptation. The proposed method is evaluated on IWSLT/NIST data sets, and the results show that phrase based SMT performances are significantly improved (up to +1.6 in comparison with phrase based SMT baseline system and +0.9 in comparison with existing methods).

pdf bib
Neural Word Segmentation Learning for Chinese
Deng Cai | Hai Zhao
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Probabilistic Graph-based Dependency Parsing with Convolutional Neural Network
Zhisong Zhang | Hai Zhao | Lianhui Qin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Shallow Discourse Parsing Using Constituent Parsing Tree
Changge Chen | Peilu Wang | Hai Zhao
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

pdf bib
Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Hai Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
Hai Zhao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
High-order Graph-based Neural Dependency Parsing
Zhisong Zhang | Hai Zhao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
English to Chinese Translation: How Chinese Character Matters
Rui Wang | Hai Zhao | Bao-Liang Lu
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Neural Network Language Model for Chinese Pinyin Input Method Engine
Shenyuan Chen | Hai Zhao | Rui Wang
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters
Hai Zhao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
A Light Rule-based Approach to English Subject-Verb Agreement Errors on the Third Person Singular Forms
Yuzhu Wang | Hai Zhao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
A Machine Learning Method to Distinguish Machine Translation from Human Translation
Yitong Li | Rui Wang | Hai Zhao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

2014

pdf bib
A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction
Zhongye Jia | Hai Zhao
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Grammatical Error Detection and Correction using a Single Maximum Entropy Model
Peilu Wang | Zhongye Jia | Hai Zhao
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
An Improved Graph Model for Chinese Spell Checking
Yang Xin | Hai Zhao | Yuzhu Wang | Zhongye Jia
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Learning Hierarchical Translation Spans
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation
Rui Wang | Hai Zhao | Bao-Liang Lu | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation
Rui Wang | Masao Utiyama | Isao Goto | Eiichro Sumita | Hai Zhao | Bao-Liang Lu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Labeled Alignment for Recognizing Textual Entailment
Xiaolin Wang | Hai Zhao | Bao-Liang Lu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
KySS 1.0: a Framework for Automatic Evaluation of Chinese Input Method Engines
Zhongye Jia | Hai Zhao
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Grammatical Error Correction as Multiclass Classification with Single Model
Zhongye Jia | Peilu Wang | Hai Zhao
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Graph Model for Chinese Spell Checking
Zhongye Jia | Peilu Wang | Hai Zhao
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

pdf bib
Vietnamese to Chinese Machine Translation via Chinese Character as Pivot
Hai Zhao | Tianjiao Yin | Jingyi Zhang
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

2012

pdf bib
Spell Checking for Chinese
Shaohua Yang | Hai Zhao | Xiaolin Wang | Bao-liang Lu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents some novel results on Chinese spell checking. In this paper, a concise algorithm based on minimized-path segmentation is proposed to reduce the cost and suit the needs of current Chinese input systems. The proposed algorithm is actually derived from a simple assumption that spelling errors often make the number of segments larger. The experimental results are quite positive and implicitly verify the effectiveness of the proposed assumption. Finally, all approaches work together to output a result much better than the baseline with 12% performance improvement.

pdf bib
Regression with Phrase Indicators for Estimating MT Quality
Chunyang Wu | Hai Zhao
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Chinese Coreference Resolution via Ordered Filtering
Xiaotian Zhang | Chunyang Wu | Hai Zhao
Joint Conference on EMNLP and CoNLL - Shared Task

pdf bib
System paper for CoNLL-2012 shared task: Hybrid Rule-based Algorithm for Coreference Resolution.
Heming Shou | Hai Zhao
Joint Conference on EMNLP and CoNLL - Shared Task

pdf bib
Towards a Semantic Annotation of English Television News - Building and Evaluating a Constraint Grammar FrameNet
Shaohua Yang | Hai Zhao | Bao-liang Lu
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

pdf bib
Fourth-Order Dependency Parsing
Xuezhe Ma | Hai Zhao
Proceedings of COLING 2012: Posters

pdf bib
Using Deep Linguistic Features for Finding Deceptive Opinion Spam
Qiongkai Xu | Hai Zhao
Proceedings of COLING 2012: Posters

pdf bib
A Machine Learning Approach to Convert CCGbank to Penn Treebank
Xiaotian Zhang | Hai Zhao | Cong Hui
Proceedings of COLING 2012: Demonstration Papers

2011

pdf bib
Enhance Top-down method with Meta-Classification for Very Large-scale Hierarchical Classification
Xiao-Lin Wang | Hai Zhao | Bao-Liang Lu
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
An Empirical Study on Development Set Selection Strategy for Machine Translation Learning
Cong Hui | Hai Zhao | Bao-Liang Lu | Yan Song
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Reranking with Multiple Features for Better Transliteration
Yan Song | Chunyu Kit | Hai Zhao
Proceedings of the 2010 Named Entities Workshop

pdf bib
Hedge Detection and Scope Finding by Sequence Labeling with Procedural Feature Selection
Shaodian Zhang | Hai Zhao | Guodong Zhou | Bao-Liang Lu
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
Dependency Parser for Chinese Constituent Parsing
Xuezhe Ma | Xiaotian Zhang | Hai Zhao | Bao-Liang Lu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method
Hai Zhao | Yan Song | Chunyu Kit
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We investigate the impact of input data scale in corpus-based learning using a study style of Zipf’s law. In our research, Chinese word segmentation is chosen as the study case and a series of experiments are specially conducted for it, in which two types of segmentation techniques, statistical learning and rule-based methods, are examined. The empirical results show that a linear performance improvement in statistical learning requires an exponential increasing of training corpus size at least. As for the rule-based method, an approximate negative inverse relationship between the performance and the size of the input lexicon can be observed.

2009

pdf bib
Semantic Dependency Parsing of NomBank and PropBank: An Efficient Integrated Approach via a Large-scale Feature Selection
Hai Zhao | Wenliang Chen | Chunyu Kit
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improving Nominal SRL in Chinese Language with Verbal SRL Information and Automatic Predicate Recognition
Junhui Li | Guodong Zhou | Hai Zhao | Qiaoming Zhu | Peide Qian
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multilingual Dependency Learning: A Huge Feature Engineering Method to Semantic Dependency Parsing
Hai Zhao | Wenliang Chen | Chunyu Kit | Guodong Zhou
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf bib
Multilingual Dependency Learning: Exploiting Rich Features for Tagging Syntactic and Semantic Dependencies
Hai Zhao | Wenliang Chen | Jun’ichi Kazama | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf bib
Cross Language Dependency Parsing using a Bilingual Lexicon
Hai Zhao | Yan Song | Chunyu Kit | Guodong Zhou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Character-Level Dependencies in Chinese: Usefulness and Learning
Hai Zhao
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework
Hai Zhao | Chunyu Kit
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition
Hai Zhao | Chunyu Kit
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Parsing Syntactic and Semantic Dependencies with Two Single-Stage Maximum Entropy Models
Hai Zhao | Chunyu Kit
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

2006

pdf bib
An Improved Chinese Word Segmentation System with Conditional Random Field
Hai Zhao | Chang-Ning Huang | Mu Li
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
Which Is Essential for Chinese Word Segmentation: Character versus Word
Chang-Ning Huang | Hai Zhao
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

pdf bib
Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling
Hai Zhao | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation