Bowen Yu


pdf bib
Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention
Yanzeng Li | Bowen Yu | Xue Mengge | Tingwen Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most Chinese pre-trained models take character as the basic unit and learn representation according to character’s external contexts, ignoring the semantics expressed in the word, which is the smallest meaningful utterance in Chinese. Hence, we propose a novel word-aligned attention to exploit explicit word information, which is complementary to various character-based Chinese pre-trained language models. Specifically, we devise a pooling mechanism to align the character-level attention to the word level and propose to alleviate the potential issue of segmentation error propagation by multi-source information fusion. As a result, word and character information are explicitly integrated at the fine-tuning procedure. Experimental results on five Chinese NLP benchmark tasks demonstrate that our method achieves significant improvements against BERT, ERNIE and BERT-wwm.

pdf bib
Edge-Enhanced Graph Convolution Networks for Event Detection with Syntactic Relation
Shiyao Cui | Bowen Yu | Tingwen Liu | Zhenyu Zhang | Xuebin Wang | Jinqiao Shi
Findings of the Association for Computational Linguistics: EMNLP 2020

Event detection (ED), a key subtask of information extraction, aims to recognize instances of specific event types in text. Previous studies on the task have verified the effectiveness of integrating syntactic dependency into graph convolutional networks. However, these methods usually ignore dependency label information, which conveys rich and useful linguistic knowledge for ED. In this paper, we propose a novel architecture named Edge-Enhanced Graph Convolution Networks (EE-GCN), which simultaneously exploits syntactic structure and typed dependency label information to perform ED. Specifically, an edge-aware node update module is designed to generate expressive word representations by aggregating syntactically-connected words through specific dependency types. Furthermore, to fully explore clues hidden from dependency edges, a node-aware edge update module is introduced, which refines the relation representations with contextual information.These two modules are complementary to each other and work in a mutual promotion way. We conduct experiments on the widely used ACE2005 dataset and the results show significant improvement over competitive baseline methods.

pdf bib
TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking
Yucheng Wang | Bowen Yu | Yueyang Zhang | Tingwen Liu | Hongsong Zhu | Limin Sun
Proceedings of the 28th International Conference on Computational Linguistics

Extracting entities and relations from unstructured text has attracted increasing attention in recent years but remains challenging, due to the intrinsic difficulty in identifying overlapping relations with shared entities. Prior works show that joint learning can result in a noticeable performance gain. However, they usually involve sequential interrelated steps and suffer from the problem of exposure bias. At training time, they predict with the ground truth conditions while at inference it has to make extraction from scratch. This discrepancy leads to error accumulation. To mitigate the issue, we propose in this paper a one-stage joint extraction model, namely, TPLinker, which is capable of discovering overlapping relations sharing one or both entities while being immune from the exposure bias. TPLinker formulates joint extraction as a token pair linking problem and introduces a novel handshaking tagging scheme that aligns the boundary tokens of entity pairs under each relation type. Experiment results show that TPLinker performs significantly better on overlapping and multiple relation extraction, and achieves state-of-the-art performance on two public datasets.

pdf bib
Document-level Relation Extraction with Dual-tier Heterogeneous Graph
Zhenyu Zhang | Bowen Yu | Xiaobo Shu | Tingwen Liu | Hengzhu Tang | Wang Yubin | Li Guo
Proceedings of the 28th International Conference on Computational Linguistics

Document-level relation extraction (RE) poses new challenges over its sentence-level counterpart since it requires an adequate comprehension of the whole document and the multi-hop reasoning ability across multiple sentences to reach the final result. In this paper, we propose a novel graph-based model with Dual-tier Heterogeneous Graph (DHG) for document-level RE. In particular, DHG is composed of a structure modeling layer followed by a relation reasoning layer. The major advantage is that it is capable of not only capturing both the sequential and structural information of documents but also mixing them together to benefit for multi-hop reasoning and final decision-making. Furthermore, we employ Graph Neural Networks (GNNs) based message propagation strategy to accumulate information on DHG. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on two widely used datasets, and further analyses suggest that all the modules in our model are indispensable for document-level RE.

pdf bib
Porous Lattice Transformer Encoder for Chinese NER
Xue Mengge | Bowen Yu | Tingwen Liu | Yue Zhang | Erli Meng | Bin Wang
Proceedings of the 28th International Conference on Computational Linguistics

Incorporating lexicons into character-level Chinese NER by lattices is proven effective to exploitrich word boundary information. Previous work has extended RNNs to consume lattice inputsand achieved great success. However, due to the DAG structure and the inherently unidirectionalsequential nature, this method precludes batched computation and sufficient semantic interaction.In this paper, we propose PLTE, an extension of transformer encoder that is tailored for ChineseNER, which models all the characters and matched lexical words in parallel with batch process-ing. PLTE augments self-attention with positional relation representations to incorporate latticestructure. It also introduces a porous mechanism to augment localness modeling and maintainthe strength of capturing the rich long-term dependencies. Experimental results show that PLTEperforms up to 11.4 times faster than state-of-the-art methods while realizing better performance.We also demonstrate that using BERT representations further substantially boosts the performanceand brings out the best in PLTE.

pdf bib
Learning to Prune Dependency Trees with Rethinking for Neural Relation Extraction
Bowen Yu | Xue Mengge | Zhenyu Zhang | Tingwen Liu | Wang Yubin | Bin Wang
Proceedings of the 28th International Conference on Computational Linguistics

Dependency trees have been shown to be effective in capturing long-range relations between target entities. Nevertheless, how to selectively emphasize target-relevant information and remove irrelevant content from the tree is still an open problem. Existing approaches employing pre-defined rules to eliminate noise may not always yield optimal results due to the complexity and variability of natural language. In this paper, we present a novel architecture named Dynamically Pruned Graph Convolutional Network (DP-GCN), which learns to prune the dependency tree with rethinking in an end-to-end scheme. In each layer of DP-GCN, we employ a selection module to concentrate on nodes expressing the target relation by a set of binary gates, and then augment the pruned tree with a pruned semantic graph to ensure the connectivity. After that, we introduce a rethinking mechanism to guide and refine the pruning operation by feeding back the high-level learned features repeatedly. Extensive experimental results demonstrate that our model achieves impressive results compared to strong competitors.

pdf bib
Coarse-to-Fine Pre-training for Named Entity Recognition
Xue Mengge | Bowen Yu | Zhenyu Zhang | Tingwen Liu | Yue Zhang | Bin Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

More recently, Named Entity Recognition hasachieved great advances aided by pre-trainingapproaches such as BERT. However, currentpre-training techniques focus on building lan-guage modeling objectives to learn a gen-eral representation, ignoring the named entity-related knowledge. To this end, we proposea NER-specific pre-training framework to in-ject coarse-to-fine automatically mined entityknowledge into pre-trained models. Specifi-cally, we first warm-up the model via an en-tity span identification task by training it withWikipedia anchors, which can be deemed asgeneral-typed entities. Then we leverage thegazetteer-based distant supervision strategy totrain the model extract coarse-grained typedentities. Finally, we devise a self-supervisedauxiliary task to mine the fine-grained namedentity knowledge via clustering.Empiricalstudies on three public NER datasets demon-strate that our framework achieves significantimprovements against several pre-trained base-lines, establishing the new state-of-the-art per-formance on three benchmarks. Besides, weshow that our framework gains promising re-sults without using human-labeled trainingdata, demonstrating its effectiveness in label-few and low-resource scenarios.