Lijie Wen


2020

pdf bib
A Graph Representation of Semi-structured Data for Web Question Answering
Xingyao Zhang | Linjun Shou | Jian Pei | Ming Gong | Lijie Wen | Daxin Jiang
Proceedings of the 28th International Conference on Computational Linguistics

The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.

pdf bib
SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction
Xuming Hu | Lijie Wen | Yusong Xu | Chenwei Zhang | Philip Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Open relation extraction is the task of extracting open-domain relation facts from natural language sentences. Existing works either utilize heuristics or distant-supervised annotations to train a supervised classifier over pre-defined relations, or adopt unsupervised methods with additional assumptions that have less discriminative power. In this work, we propose a self-supervised framework named SelfORE, which exploits weak, self-supervised signals by leveraging large pretrained language model for adaptive clustering on contextualized relational features, and bootstraps the self-supervised signals by improving contextualized features in relation classification. Experimental results on three datasets show the effectiveness and robustness of SelfORE on open-domain Relation Extraction when comparing with competitive baselines.