An Yang


2019

pdf bib
Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension
An Yang | Quan Wang | Jing Liu | Kai Liu | Yajuan Lyu | Hua Wu | Qiaoqiao She | Sujian Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Machine reading comprehension (MRC) is a crucial and challenging task in NLP. Recently, pre-trained language models (LMs), especially BERT, have achieved remarkable success, presenting new state-of-the-art results in MRC. In this work, we investigate the potential of leveraging external knowledge bases (KBs) to further improve BERT for MRC. We introduce KT-NET, which employs an attention mechanism to adaptively select desired knowledge from KBs, and then fuses selected knowledge with BERT to enable context- and knowledge-aware predictions. We believe this would combine the merits of both deep LMs and curated KBs towards better MRC. Experimental results indicate that KT-NET offers significant and consistent improvements over BERT, outperforming competitive baselines on ReCoRD and SQuAD1.1 benchmarks. Notably, it ranks the 1st place on the ReCoRD leaderboard, and is also the best single model on the SQuAD1.1 leaderboard at the time of submission (March 4th, 2019).

2018

pdf bib
SciDTB: Discourse Dependency TreeBank for Scientific Abstracts
An Yang | Sujian Li
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Annotation corpus for discourse relations benefits NLP tasks such as machine translation and question answering. In this paper, we present SciDTB, a domain-specific discourse treebank annotated on scientific articles. Different from widely-used RST-DT and PDTB, SciDTB uses dependency trees to represent discourse structure, which is flexible and simplified to some extent but do not sacrifice structural integrity. We discuss the labeling framework, annotation workflow and some statistics about SciDTB. Furthermore, our treebank is made as a benchmark for evaluating discourse dependency parsers, on which we provide several baselines as fundamental work.

pdf bib
Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task
An Yang | Kai Liu | Jing Liu | Yajuan Lyu | Sujian Li
Proceedings of the Workshop on Machine Reading for Question Answering

Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems.

2016

pdf bib
Domain Ontology Learning Enhanced by Optimized Relation Instance in DBpedia
Liumingjing Xiao | Chong Ruan | An Yang | Junhao Zhang | Junfeng Hu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Ontologies are powerful to support semantic based applications and intelligent systems. While ontology learning are challenging due to its bottleneck in handcrafting structured knowledge sources and training data. To address this difficulty, many researchers turn to ontology enrichment and population using external knowledge sources such as DBpedia. In this paper, we propose a method using DBpedia in a different manner. We utilize relation instances in DBpedia to supervise the ontology learning procedure from unstructured text, rather than populate the ontology structure as a post-processing step. We construct three language resources in areas of computer science: enriched Wikipedia concept tree, domain ontology, and gold standard from NSFC taxonomy. Experiment shows that the result of ontology learning from corpus of computer science can be improved via the relation instances extracted from DBpedia in the same field. Furthermore, making distinction between the relation instances and applying a proper weighting scheme in the learning procedure lead to even better result.