Chunping Ma


2020

pdf bib
Hierarchy-Aware Global Model for Hierarchical Text Classification
Jie Zhou | Chunping Ma | Dingkun Long | Guangwei Xu | Ning Ding | Haoyu Zhang | Pengjun Xie | Gongshen Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Hierarchical text classification is an essential yet challenging subtask of multi-label text classification with a taxonomic hierarchy. Existing methods have difficulties in modeling the hierarchical label structure in a global view. Furthermore, they cannot make full use of the mutual interactions between the text feature space and the label space. In this paper, we formulate the hierarchy as a directed graph and introduce hierarchy-aware structure encoders for modeling label dependencies. Based on the hierarchy encoder, we propose a novel end-to-end hierarchy-aware global model (HiAGM) with two variants. A multi-label attention variant (HiAGM-LA) learns hierarchy-aware label embeddings through the hierarchy encoder and conducts inductive fusion of label-aware text features. A text feature propagation model (HiAGM-TP) is proposed as the deductive variant that directly feeds text features into hierarchy encoders. Compared with previous works, both HiAGM-LA and HiAGM-TP achieve significant and consistent improvements on three benchmark datasets.

2019

pdf bib
DM_NLP at SemEval-2018 Task 12: A Pipeline System for Toponym Resolution
Xiaobin Wang | Chunping Ma | Huafei Zheng | Chu Liu | Pengjun Xie | Linlin Li | Luo Si
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes DM-NLP’s system for toponym resolution task at Semeval 2019. Our system was developed for toponym detection, disambiguation and end-to-end resolution which is a pipeline of the former two. For toponym detection, we utilized the state-of-the-art sequence labeling model, namely, BiLSTM-CRF model as backbone. A lot of strategies are adopted for further improvement, such as pre-training, model ensemble, model averaging and data augment. For toponym disambiguation, we adopted the widely used searching and ranking framework. For ranking, we proposed several effective features for measuring the consistency between the detected toponym and toponyms in GeoNames. Eventually, our system achieved the best performance among all the submitted results in each sub task.

2018

pdf bib
DM_NLP at SemEval-2018 Task 8: neural sequence labeling with linguistic features
Chunping Ma | Huafei Zheng | Pengjun Xie | Chen Li | Linlin Li | Luo Si
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes our submissions for SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using NLP. The DM_NLP participated in two subtasks: SubTask 1 classifies if a sentence is useful for inferring malware actions and capabilities, and SubTask 2 predicts token labels (“Action”, “Entity”, “Modifier” and “Others”) for a given malware-related sentence. Since we leverage results of Subtask 2 directly to infer the result of Subtask 1, the paper focus on the system solving Subtask 2. By taking Subtask 2 as a sequence labeling task, our system relies on a recurrent neural network named BiLSTM-CNN-CRF with rich linguistic features, such as POS tags, dependency parsing labels, chunking labels, NER labels, Brown clustering. Our system achieved the highest F1 score in both token level and phrase level.