Baotian Hu


pdf bib
MedWriter: Knowledge-Aware Medical Text Generation
Youcheng Pan | Qingcai Chen | Weihua Peng | Xiaolong Wang | Baotian Hu | Xin Liu | Junying Chen | Wenxiu Zhou
Proceedings of the 28th International Conference on Computational Linguistics

To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical. However, most of recent works only consider the information of unstructured text rather than structured information of the knowledge graph. In this paper, we focus on the medical topic-to-text generation task and adapt a knowledge-aware text generation model to the medical domain, named MedWriter, which not only introduces the specific knowledge from the external MKG but also is capable of learning graph-level representation. We conduct experiments on a medical literature dataset collected from medical journals, each of which has a set of topic words, an abstract of medical literature and a corresponding knowledge graph from CMeKG. Experimental results demonstrate incorporating knowledge graph into generation model can improve the quality of the generated text and has robust superiority over the competitor methods.

pdf bib
Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text
Dongfang Li | Baotian Hu | Qingcai Chen | Weihua Peng | Anqi Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Machine reading comprehension (MRC) has achieved significant progress on the open domain in recent years, mainly due to large-scale pre-trained language models. However, it performs much worse in specific domains such as the medical field due to the lack of extensive training data and professional structural knowledge neglect. As an effort, we first collect a large scale medical multi-choice question dataset (more than 21k instances) for the National Licensed Pharmacist Examination in China. It is a challenging medical examination with a passing rate of less than 14.2% in 2018. Then we propose a novel reading comprehension model KMQA, which can fully exploit the structural medical knowledge (i.e., medical knowledge graph) and the reference medical plain text (i.e., text snippets retrieved from reference books). The experimental results indicate that the KMQA outperforms existing competitive models with a large margin and passes the exam with 61.8% accuracy rate on the test set.


pdf bib
Trigger Word Detection and Thematic Role Identification via BERT and Multitask Learning
Dongfang Li | Ying Xiong | Baotian Hu | Hanyang Du | Buzhou Tang | Qingcai Chen
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

The prediction of the relationship between the disease with genes and its mutations is a very important knowledge extraction task that can potentially help drug discovery. In this paper, we present our approaches for trigger word detection (task 1) and the identification of its thematic role (task 2) in AGAC track of BioNLP Open Shared Task 2019. Task 1 can be regarded as the traditional name entity recognition (NER), which cultivates molecular phenomena related to gene mutation. Task 2 can be regarded as relation extraction which captures the thematic roles between entities. For two tasks, we exploit the pre-trained biomedical language representation model (i.e., BERT) in the pipe of information extraction for the collection of mutation-disease knowledge from PubMed. And also, we design a fine-tuning technique and extra features by using multi-task learning. The experiment results show that our proposed approaches achieve 0.60 (ranks 1) and 0.25 (ranks 2) on task 1 and task 2 respectively in terms of F1 metric.


pdf bib
Sentence Simplification with Memory-Augmented Neural Networks
Tu Vu | Baotian Hu | Tsendsuren Munkhdalai | Hong Yu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.


pdf bib
LCSTS: A Large Scale Chinese Short Text Summarization Dataset
Baotian Hu | Qingcai Chen | Fangze Zhu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Context-Dependent Translation Selection Using Convolutional Neural Network
Baotian Hu | Zhaopeng Tu | Zhengdong Lu | Hang Li | Qingcai Chen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering
Xiaoqiang Zhou | Baotian Hu | Qingcai Chen | Buzhou Tang | Xiaolong Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
ICRC-HIT: A Deep Learning based Comment Sequence Labeling System for Answer Selection Challenge
Xiaoqiang Zhou | Baotian Hu | Jiaxin Lin | Yang Xiang | Xiaolong Wang
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)