Jie Tang


2020

pdf bib
MOOCCube: A Large-scale Data Repository for NLP Applications in MOOCs
Jifan Yu | Gan Luo | Tong Xiao | Qingyang Zhong | Yuquan Wang | Wenzheng Feng | Junyi Luo | Chenyu Wang | Lei Hou | Juanzi Li | Zhiyuan Liu | Jie Tang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The prosperity of Massive Open Online Courses (MOOCs) provides fodder for many NLP and AI research for education applications, e.g., course concept extraction, prerequisite relation discovery, etc. However, the publicly available datasets of MOOC are limited in size with few types of data, which hinders advanced models and novel attempts in related topics. Therefore, we present MOOCCube, a large-scale data repository of over 700 MOOC courses, 100k concepts, 8 million student behaviors with an external resource. Moreover, we conduct a prerequisite discovery task as an example application to show the potential of MOOCCube in facilitating relevant research. The data repository is now available at http://moocdata.cn/data/MOOCCube.

pdf bib
Blockwise Self-Attention for Long Document Understanding
Jiezhong Qiu | Hao Ma | Omer Levy | Wen-tau Yih | Sinong Wang | Jie Tang
Findings of the Association for Computational Linguistics: EMNLP 2020

We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training/inference time, which also enables attention heads to capture either short- or long-range contextual information. We conduct experiments on language model pre-training and several benchmark question answering datasets with various paragraph lengths. BlockBERT uses 18.7-36.1% less memory and 12.0-25.1% less time to learn the model. During testing, BlockBERT saves 27.8% inference time, while having comparable and sometimes better prediction accuracy, compared to an advanced BERT-based model, RoBERTa.

2019

pdf bib
Towards Knowledge-Based Recommender Dialog System
Qibin Chen | Junyang Lin | Yichang Zhang | Ming Ding | Yukuo Cen | Hongxia Yang | Jie Tang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we propose a novel end-to-end framework called KBRD, which stands for Knowledge-Based Recommender Dialog System. It integrates the recommender system and the dialog generation system. The dialog generation system can enhance the performance of the recommendation system by introducing information about users’ preferences, and the recommender system can improve that of the dialog generation system by providing recommendation-aware vocabulary bias. Experimental results demonstrate that our proposed model has significant advantages over the baselines in both the evaluation of dialog generation and recommendation. A series of analyses show that the two systems can bring mutual benefits to each other, and the introduced knowledge contributes to both their performances.

pdf bib
Cognitive Graph for Multi-Hop Reading Comprehension at Scale
Ming Ding | Chang Zhou | Qibin Chen | Hongxia Yang | Jie Tang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a new CogQA framework for multi-hop reading comprehension question answering in web-scale documents. Founded on the dual process theory in cognitive science, the framework gradually builds a cognitive graph in an iterative process by coordinating an implicit extraction module (System 1) and an explicit reasoning module (System 2). While giving accurate answers, our framework further provides explainable reasoning paths. Specifically, our implementation based on BERT and graph neural network efficiently handles millions of documents for multi-hop reasoning questions in the HotpotQA fullwiki dataset, achieving a winning joint F1 score of 34.9 on the leaderboard, compared to 23.1 of the best competitor.

pdf bib
Course Concept Expansion in MOOCs with External Knowledge and Interactive Game
Jifan Yu | Chenyu Wang | Gan Luo | Lei Hou | Juanzi Li | Zhiyuan Liu | Jie Tang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

As Massive Open Online Courses (MOOCs) become increasingly popular, it is promising to automatically provide extracurricular knowledge for MOOC users. Suffering from semantic drifts and lack of knowledge guidance, existing methods can not effectively expand course concepts in complex MOOC environments. In this paper, we first build a novel boundary during searching for new concepts via external knowledge base and then utilize heterogeneous features to verify the high-quality results. In addition, to involve human efforts in our model, we design an interactive optimization mechanism based on a game. Our experiments on the four datasets from Coursera and XuetangX show that the proposed method achieves significant improvements(+0.19 by MAP) over existing methods.

2017

pdf bib
Prerequisite Relation Learning for Concepts in MOOCs
Liangming Pan | Chengjiang Li | Juanzi Li | Jie Tang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

What prerequisite knowledge should students achieve a level of mastery before moving forward to learn subsequent coursewares? We study the extent to which the prerequisite relation between knowledge concepts in Massive Open Online Courses (MOOCs) can be inferred automatically. In particular, what kinds of information can be leverage to uncover the potential prerequisite relation between knowledge concepts. We first propose a representation learning-based method for learning latent representations of course concepts, and then investigate how different features capture the prerequisite relations between concepts. Our experiments on three datasets form Coursera show that the proposed method achieves significant improvements (+5.9-48.0% by F1-score) comparing with existing methods.

pdf bib
Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation
Liangming Pan | Xiaochen Wang | Chengjiang Li | Juanzi Li | Jie Tang
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Massive Open Online Courses (MOOCs), offering a new way to study online, are revolutionizing education. One challenging issue in MOOCs is how to design effective and fine-grained course concepts such that students with different backgrounds can grasp the essence of the course. In this paper, we conduct a systematic investigation of the problem of course concept extraction for MOOCs. We propose to learn latent representations for candidate concepts via an embedding-based method. Moreover, we develop a graph-based propagation algorithm to rank the candidate concepts based on the learned representations. We evaluate the proposed method using different courses from XuetangX and Coursera. Experimental results show that our method significantly outperforms all the alternative methods (+0.013-0.318 in terms of R-precision; p<<0.01, t-test).

2015

pdf bib
Name List Only? Target Entity Disambiguation in Short Texts
Yixin Cao | Juanzi Li | Xiaofei Guo | Shuanhu Bai | Heng Ji | Jie Tang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Topic Hierarchies for Wikipedia Categories
Linmei Hu | Xuzhong Wang | Mengdi Zhang | Juanzi Li | Xiaoli Li | Chao Shao | Jie Tang | Yongbin Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2013

pdf bib
Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
Zhigang Wang | Zhixing Li | Juanzi Li | Jie Tang | Jeff Z. Pan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2007

pdf bib
A Unified Tagging Approach to Text Normalization
Conghui Zhu | Jie Tang | Hang Li | Hwee Tou Ng | Tiejun Zhao
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics