Fan Yang


Predicting Personal Opinion on Future Events with Fingerprints
Fan Yang | Eduard Dragut | Arjun Mukherjee
Proceedings of the 28th International Conference on Computational Linguistics

Predicting users’ opinions in their response to social events has important real-world applications, many of which political and social impacts. Existing approaches derive a population’s opinion on a going event from large scores of user generated content. In certain scenarios, we may not be able to acquire such content and thus cannot infer an unbiased opinion on those emerging events. To address this problem, we propose to explore opinion on unseen articles based on one’s fingerprinting: the prior reading and commenting history. This work presents a focused study on modeling and leveraging fingerprinting techniques to predict a user’s future opinion. We introduce a recurrent neural network based model that integrates fingerprinting. We collect a large dataset that consists of event-comment pairs from six news websites. We evaluate the proposed model on this dataset. The results show substantial performance gains demonstrating the effectiveness of our approach.

Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification
Juan Li | Ruoxu Wang | Ningyu Zhang | Wen Zhang | Fan Yang | Huajun Chen
Proceedings of the 28th International Conference on Computational Linguistics

Relation classification aims to extract semantic relations between entity pairs from the sentences. However, most existing methods can only identify seen relation classes that occurred during training. To recognize unseen relations at test time, we explore the problem of zero-shot relation classification. Previous work regards the problem as reading comprehension or textual entailment, which have to rely on artificial descriptive information to improve the understandability of relation types. Thus, rich semantic knowledge of the relation labels is ignored. In this paper, we propose a novel logic-guided semantic representation learning model for zero-shot relation classification. Our approach builds connections between seen and unseen relations via implicit and explicit semantic representations with knowledge graph embeddings and logic rules. Extensive experimental results demonstrate that our method can generalize to unseen relation types and achieve promising improvements.

XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation
Yaobo Liang | Nan Duan | Yeyun Gong | Ning Wu | Fenfei Guo | Weizhen Qi | Ming Gong | Linjun Shou | Daxin Jiang | Guihong Cao | Xiaodong Fan | Ruofei Zhang | Rahul Agrawal | Edward Cui | Sining Wei | Taroon Bharti | Ying Qiao | Jiun-Hung Chen | Winnie Wu | Shuguang Liu | Fan Yang | Daniel Campos | Rangan Majumder | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE (Wang et al.,2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages: (1) it provides two corpora with different sizes for cross-lingual pre-training; (2) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (3) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.


Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification
Fan Yang | Xiaochang Peng | Gargi Ghosh | Reshef Shilon | Hao Ma | Eider Moore | Goran Predovic
Proceedings of the Third Workshop on Abusive Language Online

Interactions among users on social network platforms are usually positive, constructive and insightful. However, sometimes people also get exposed to objectionable content such as hate speech, bullying, and verbal abuse etc. Most social platforms have explicit policy against hate speech because it creates an environment of intimidation and exclusion, and in some cases may promote real-world violence. As users’ interactions on today’s social networks involve multiple modalities, such as texts, images and videos, in this paper we explore the challenge of automatically identifying hate speech with deep multimodal technologies, extending previous research which mostly focuses on the text signal alone. We present a number of fusion approaches to integrate text and photo signals. We show that augmenting text with image embedding information immediately leads to a boost in performance, while applying additional attention fusion methods brings further improvement.


Attending Sentences to detect Satirical Fake News
Sohan De Sarkar | Fan Yang | Arjun Mukherjee
Proceedings of the 27th International Conference on Computational Linguistics

Satirical news detection is important in order to prevent the spread of misinformation over the Internet. Existing approaches to capture news satire use machine learning models such as SVM and hierarchical neural networks along with hand-engineered features, but do not explore sentence and document difference. This paper proposes a robust, hierarchical deep neural network approach for satire detection, which is capable of capturing satire both at the sentence level and at the document level. The architecture incorporates pluggable generic neural networks like CNN, GRU, and LSTM. Experimental results on real world news satire dataset show substantial performance gains demonstrating the effectiveness of our proposed approach. An inspection of the learned models reveals the existence of key sentences that control the presence of satire in news.


Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features
Fan Yang | Arjun Mukherjee | Eduard Dragut
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Satirical news is considered to be entertainment, but it is potentially deceptive and harmful. Despite the embedded genre in the article, not everyone can recognize the satirical cues and therefore believe the news as true news. We observe that satirical cues are often reflected in certain paragraphs rather than the whole document. Existing works only consider document-level features to detect the satire, which could be limited. We consider paragraph-level linguistic features to unveil the satire by incorporating neural network and attention mechanism. We investigate the difference between paragraph-level features and document-level features, and analyze them on a large satirical news dataset. The evaluation shows that the proposed model detects satirical news effectively and reveals what features are important at which level.


An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition
Wencan Luo | Fan Yang
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Leveraging Multiple Domains for Sentiment Classification
Fan Yang | Arjun Mukherjee | Yifan Zhang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Sentiment classification becomes more and more important with the rapid growth of user generated content. However, sentiment classification task usually comes with two challenges: first, sentiment classification is highly domain-dependent and training sentiment classifier for every domain is inefficient and often impractical; second, since the quantity of labeled data is important for assessing the quality of classifier, it is hard to evaluate classifiers when labeled data is limited for certain domains. To address the challenges mentioned above, we focus on learning high-level features that are able to generalize across domains, so a global classifier can benefit with a simple combination of documents from multiple domains. In this paper, the proposed model incorporates both sentiment polarity and unlabeled data from multiple domains and learns new feature representations. Our model doesn’t require labels from every domain, which means the learned feature representation can be generalized for sentiment domain adaptation. In addition, the learned feature representation can be used as classifier since our model defines the meaning of feature value and arranges high-level features in a prefixed order, so it is not necessary to train another classifier on top of the new features. Empirical evaluations demonstrate our model outperforms baselines and yields competitive results to other state-of-the-art works on benchmark datasets.


Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields
Fan Yang | Paul Vozila
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)


An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
Fan Yang | Paul Vozila
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


An Investigation of Interruptions and Resumptions in Multi-Tasking Dialogues
Fan Yang | Peter A. Heeman | Andrew L. Kun
Computational Linguistics, Volume 37, Issue 1 - March 2011


A Chinese-English Organization Name Translation System Using Heuristic Web Mining and Asymmetric Alignment
Fan Yang | Jun Zhao | Kang Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP


Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
Fan Yang | Jun Zhao | Bo Zou | Kang Liu | Feifan Liu
Proceedings of ACL-08: HLT

CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching
Fan Yang | Jun Zhao | Bo Zou
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

Switching to Real-Time Tasks in Multi-Tasking Dialogue
Fan Yang | Peter A. Heeman | Andrew Kun
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)


Avoiding and Resolving Initiative Conflicts in Dialogue
Fan Yang | Peter A. Heeman
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference


A Data Driven Approach to Relevancy Recognition for Contextual Question Answering
Fan Yang | Junlan Feng | Giuseppe Di Fabbrizio
Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006


DialogueView: an Annotation Tool for Dialogue
Fan Yang | Peter A. Heeman
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations


DialogueView - An Annotation Tool for Dialogue
Peter A. Heeman | Fan Yang | Susan E. Strayer
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue