Fangzhao Wu


2020

pdf bib
Fine-grained Interest Matching for Neural News Recommendation
Heyuan Wang | Fangzhao Wu | Zheng Liu | Xing Xie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Personalized news recommendation is a critical technology to improve users’ online news reading experience. The core of news recommendation is accurate matching between user’s interests and candidate news. The same user usually has diverse interests that are reflected in different news she has browsed. Meanwhile, important semantic features of news are implied in text segments of different granularities. Existing studies generally represent each user as a single vector and then match the candidate news vector, which may lose fine-grained information for recommendation. In this paper, we propose FIM, a Fine-grained Interest Matching method for neural news recommendation. Instead of aggregating user’s all historical browsed news into a unified vector, we hierarchically construct multi-level representations for each news via stacked dilated convolutions. Then we perform fine-grained matching between segment pairs of each browsed news and the candidate news at each semantic level. High-order salient signals are then identified by resembling the hierarchy of image recognition for final click prediction. Extensive experiments on a real-world dataset from MSN news validate the effectiveness of our model on news recommendation.

pdf bib
Attentive Pooling with Learnable Norms for Text Representation
Chuhan Wu | Fangzhao Wu | Tao Qi | Xiaohui Cui | Yongfeng Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pooling is an important technique for learning text representations in many neural NLP models. In conventional pooling methods such as average, max and attentive pooling, text representations are weighted summations of the L1 or L∞ norm of input features. However, their pooling norms are always fixed and may not be optimal for learning accurate text representations in different tasks. In addition, in many popular pooling methods such as max and attentive pooling some features may be over-emphasized, while other useful ones are not fully exploited. In this paper, we propose an Attentive Pooling with Learnable Norms (APLN) approach for text representation. Different from existing pooling methods that use a fixed pooling norm, we propose to learn the norm in an end-to-end manner to automatically find the optimal ones for text representation in different tasks. In addition, we propose two methods to ensure the numerical stability of the model training. The first one is scale limiting, which re-scales the input to ensure non-negativity and alleviate the risk of exponential explosion. The second one is re-formulation, which decomposes the exponent operation to avoid computing the real-valued powers of the input and further accelerate the pooling operation. Experimental results on four benchmark datasets show that our approach can effectively improve the performance of attentive pooling.

pdf bib
MIND: A Large-scale Dataset for News Recommendation
Fangzhao Wu | Ying Qiao | Jiun-Hung Chen | Chuhan Wu | Tao Qi | Jianxun Lian | Danyang Liu | Xing Xie | Jianfeng Gao | Winnie Wu | Ming Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

News recommendation is an important technique for personalized news service. Compared with product and movie recommendations which have been comprehensively studied, the research on news recommendation is much more limited, mainly due to the lack of a high-quality benchmark dataset. In this paper, we present a large-scale dataset named MIND for news recommendation. Constructed from the user click logs of Microsoft News, MIND contains 1 million users and more than 160k English news articles, each of which has rich textual content such as title, abstract and body. We demonstrate MIND a good testbed for news recommendation through a comparative study of several state-of-the-art news recommendation methods which are originally developed on different proprietary datasets. Our results show the performance of news recommendation highly relies on the quality of news content understanding and user interest modeling. Many natural language processing techniques such as effective text representation methods and pre-trained language models can effectively improve the performance of news recommendation. The MIND dataset will be available at https://msnews.github.io.

pdf bib
Named Entity Recognition with Context-Aware Dictionary Knowledge
Chuhan Wu | Fangzhao Wu | Tao Qi | Yongfeng Huang
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Named entity recognition (NER) is an important task in the natural language processing field. Existing NER methods heavily rely on labeled data for model training, and their performance on rare entities is usually unsatisfactory. Entity dictionaries can cover many entities including both popular ones and rare ones, and are useful for NER. However, many entity names are context-dependent and it is not optimal to directly apply dictionaries without considering the context. In this paper, we propose a neural NER approach which can exploit dictionary knowledge with contextual information. We propose to learn context-aware dictionary knowledge by modeling the interactions between the entities in dictionaries and their contexts via context-dictionary attention. In addition, we propose an auxiliary term classification task to predict the types of the matched entity names, and jointly train it with the NER model to fuse both contexts and dictionary knowledge into NER. Extensive experiments on the CoNLL-2003 benchmark dataset validate the effectiveness of our approach in exploiting entity dictionaries to improve the performance of various NER models.

pdf bib
Clickbait Detection with Style-aware Title Modeling and Co-attention
Chuhan Wu | Fangzhao Wu | Tao Qi | Yongfeng Huang
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Clickbait is a form of web content designed to attract attention and entice users to click on specific hyperlinks. The detection of clickbaits is an important task for online platforms to improve the quality of web content and the satisfaction of users. Clickbait detection is typically formed as a binary classification task based on the title and body of a webpage, and existing methods are mainly based on the content of title and the relevance between title and body. However, these methods ignore the stylistic patterns of titles, which can provide important clues on identifying clickbaits. In addition, they do not consider the interactions between the contexts within title and body, which are very important for measuring their relevance for clickbait detection. In this paper, we propose a clickbait detection approach with style-aware title modeling and co-attention. Specifically, we use Transformers to learn content representations of title and body, and respectively compute two content-based clickbait scores for title and body based on their representations. In addition, we propose to use a character-level Transformer to learn a style-aware title representation by capturing the stylistic patterns of title, and we compute a title stylistic score based on this representation. Besides, we propose to use a co-attention network to model the relatedness between the contexts within title and body, and further enhance their representations by encoding the interaction information. We compute a title-body matching score based on the representations of title and body enhanced by their interactions. The final clickbait score is predicted by a weighted summation of the aforementioned four kinds of scores. Extensive experiments on two benchmark datasets show that our approach can effectively improve the performance of clickbait detection and consistently outperform many baseline methods.

pdf bib
Privacy-Preserving News Recommendation Model Learning
Tao Qi | Fangzhao Wu | Chuhan Wu | Yongfeng Huang | Xing Xie
Findings of the Association for Computational Linguistics: EMNLP 2020

News recommendation aims to display news articles to users based on their personal interest. Existing news recommendation methods rely on centralized storage of user behavior data for model training, which may lead to privacy concerns and risks due to the privacy-sensitive nature of user behaviors. In this paper, we propose a privacy-preserving method for news recommendation model training based on federated learning, where the user behavior data is locally stored on user devices. Our method can leverage the useful information in the behaviors of massive number users to train accurate news recommendation models and meanwhile remove the need of centralized storage of them. More specifically, on each user device we keep a local copy of the news recommendation model, and compute gradients of the local model based on the user behaviors in this device. The local gradients from a group of randomly selected users are uploaded to server, which are further aggregated to update the global model in the server. Since the model gradients may contain some implicit private information, we apply local differential privacy (LDP) to them before uploading for better privacy protection. The updated global model is then distributed to each user device for local model update. We repeat this process for multiple rounds. Extensive experiments on a real-world dataset show the effectiveness of our method in news recommendation model training with privacy protection.

pdf bib
PTUM: Pre-training User Model from Unlabeled User Behaviors via Self-supervision
Chuhan Wu | Fangzhao Wu | Tao Qi | Jianxun Lian | Yongfeng Huang | Xing Xie
Findings of the Association for Computational Linguistics: EMNLP 2020

User modeling is critical for many personalized web services. Many existing methods model users based on their behaviors and the labeled data of target tasks. However, these methods cannot exploit useful information in unlabeled user behavior data, and their performance may be not optimal when labeled data is scarce. Motivated by pre-trained language models which are pre-trained on large-scale unlabeled corpus to empower many downstream tasks, in this paper we propose to pre-train user models from large-scale unlabeled user behaviors data. We propose two self-supervision tasks for user model pre-training. The first one is masked behavior prediction, which can model the relatedness between historical behaviors. The second one is next K behavior prediction, which can model the relatedness between past and future behaviors. The pre-trained user models are finetuned in downstream tasks to learn task-specific user representations. Experimental results on two real-world datasets validate the effectiveness of our proposed user model pre-training method.

2019

pdf bib
Neural News Recommendation with Heterogeneous User Behavior
Chuhan Wu | Fangzhao Wu | Mingxiao An | Tao Qi | Jianqiang Huang | Yongfeng Huang | Xing Xie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

News recommendation is important for online news platforms to help users find interested news and alleviate information overload. Existing news recommendation methods usually rely on the news click history to model user interest. However, these methods may suffer from the data sparsity problem, since the news click behaviors of many users in online news platforms are usually very limited. Fortunately, some other kinds of user behaviors such as webpage browsing and search queries can also provide useful clues of users’ news reading interest. In this paper, we propose a neural news recommendation approach which can exploit heterogeneous user behaviors. Our approach contains two major modules, i.e., news representation and user representation. In the news representation module, we learn representations of news from their titles via CNN networks, and apply attention networks to select important words. In the user representation module, we propose an attentive multi-view learning framework to learn unified representations of users from their heterogeneous behaviors such as search queries, clicked news and browsed webpages. In addition, we use word- and record-level attentions to select informative words and behavior records. Experiments on a real-world dataset validate the effectiveness of our approach.

pdf bib
Reviews Meet Graphs: Enhancing User and Item Representations for Recommendation with Hierarchical Attentive Graph Neural Network
Chuhan Wu | Fangzhao Wu | Tao Qi | Suyu Ge | Yongfeng Huang | Xing Xie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

User and item representation learning is critical for recommendation. Many of existing recommendation methods learn representations of users and items based on their ratings and reviews. However, the user-user and item-item relatedness are usually not considered in these methods, which may be insufficient. In this paper, we propose a neural recommendation approach which can utilize useful information from both review content and user-item graphs. Since reviews and graphs have different characteristics, we propose to use a multi-view learning framework to incorporate them as different views. In the review content-view, we propose to use a hierarchical model to first learn sentence representations from words, then learn review representations from sentences, and finally learn user/item representations from reviews. In addition, we propose to incorporate a three-level attention network into this view to select important words, sentences and reviews for learning informative user and item representations. In the graph-view, we propose a hierarchical graph neural network to jointly model the user-item, user-user and item-item relatedness by capturing the first- and second-order interactions between users and items in the user-item graph. In addition, we apply attention mechanism to model the importance of these interactions to learn informative user and item representations. Extensive experiments on four benchmark datasets validate the effectiveness of our approach.

pdf bib
Neural News Recommendation with Multi-Head Self-Attention
Chuhan Wu | Fangzhao Wu | Suyu Ge | Tao Qi | Yongfeng Huang | Xing Xie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

News recommendation can help users find interested news and alleviate information overload. Precisely modeling news and users is critical for news recommendation, and capturing the contexts of words and news is important to learn news and user representations. In this paper, we propose a neural news recommendation approach with multi-head self-attention (NRMS). The core of our approach is a news encoder and a user encoder. In the news encoder, we use multi-head self-attentions to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multi-head self-attention to capture the relatedness between the news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news. Experiments on a real-world dataset validate the effectiveness and efficiency of our approach.

pdf bib
Neural News Recommendation with Long- and Short-term User Representations
Mingxiao An | Fangzhao Wu | Chuhan Wu | Kun Zhang | Zheng Liu | Xing Xie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Personalized news recommendation is important to help users find their interested news and improve reading experience. A key problem in news recommendation is learning accurate user representations to capture their interests. Users usually have both long-term preferences and short-term interests. However, existing news recommendation methods usually learn single representations of users, which may be insufficient. In this paper, we propose a neural news recommendation approach which can learn both long- and short-term user representations. The core of our approach is a news encoder and a user encoder. In the news encoder, we learn representations of news from their titles and topic categories, and use attention network to select important words. In the user encoder, we propose to learn long-term user representations from the embeddings of their IDs.In addition, we propose to learn short-term user representations from their recently browsed news via GRU network. Besides, we propose two methods to combine long-term and short-term user representations. The first one is using the long-term user representation to initialize the hidden state of the GRU network in short-term user representation. The second one is concatenating both long- and short-term user representations as a unified user vector. Extensive experiments on a real-world dataset show our approach can effectively improve the performance of neural news recommendation.

pdf bib
Neural News Recommendation with Topic-Aware News Representation
Chuhan Wu | Fangzhao Wu | Mingxiao An | Yongfeng Huang | Xing Xie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

News recommendation can help users find interested news and alleviate information overload. The topic information of news is critical for learning accurate news and user representations for news recommendation. However, it is not considered in many existing news recommendation methods. In this paper, we propose a neural news recommendation approach with topic-aware news representations. The core of our approach is a topic-aware news encoder and a user encoder. In the news encoder we learn representations of news from their titles via CNN networks and apply attention networks to select important words. In addition, we propose to learn topic-aware news representations by jointly training the news encoder with an auxiliary topic classification task. In the user encoder we learn the representations of users from their browsed news and use attention networks to select informative news for user representation learning. Extensive experiments on a real-world dataset validate the effectiveness of our approach.

pdf bib
Exploring Sequence-to-Sequence Learning in Aspect Term Extraction
Dehong Ma | Sujian Li | Fangzhao Wu | Xing Xie | Houfeng Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Aspect term extraction (ATE) aims at identifying all aspect terms in a sentence and is usually modeled as a sequence labeling problem. However, sequence labeling based methods cannot make full use of the overall meaning of the whole sentence and have the limitation in processing dependencies between labels. To tackle these problems, we first explore to formalize ATE as a sequence-to-sequence (Seq2Seq) learning task where the source sequence and target sequence are composed of words and labels respectively. At the same time, to make Seq2Seq learning suit to ATE where labels correspond to words one by one, we design the gated unit networks to incorporate corresponding word representation into the decoder, and position-aware attention to pay more attention to the adjacent words of a target word. The experimental results on two datasets show that Seq2Seq learning is effective in ATE accompanied with our proposed gated unit networks and position-aware attention mechanism.

pdf bib
Hierarchical User and Item Representation with Three-Tier Attention for Recommendation
Chuhan Wu | Fangzhao Wu | Junxin Liu | Yongfeng Huang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Utilizing reviews to learn user and item representations is useful for recommender systems. Existing methods usually merge all reviews from the same user or for the same item into a long document. However, different reviews, sentences and even words usually have different informativeness for modeling users and items. In this paper, we propose a hierarchical user and item representation model with three-tier attention to learn user and item representations from reviews for recommendation. Our model contains three major components, i.e., a sentence encoder to learn sentence representations from words, a review encoder to learn review representations from sentences, and a user/item encoder to learn user/item representations from reviews. In addition, we incorporate a three-tier attention network in our model to select important words, sentences and reviews. Besides, we combine the user and item representations learned from the reviews with user and item embeddings based on IDs as the final representations to capture the latent factors of individual users and items. Extensive experiments on four benchmark datasets validate the effectiveness of our approach.

2018

pdf bib
THU_NGN at SemEval-2018 Task 3: Tweet Irony Detection with Densely connected LSTM and Multi-task Learning
Chuhan Wu | Fangzhao Wu | Sixing Wu | Junxin Liu | Zhigang Yuan | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Detecting irony is an important task to mine fine-grained information from social web messages. Therefore, the Semeval-2018 task 3 is aimed to detect the ironic tweets (subtask A) and their ironic types (subtask B). In order to address this task, we propose a system based on a densely connected LSTM network with multi-task learning strategy. In our dense LSTM model, each layer will take all outputs from previous layers as input. The last LSTM layer will output the hidden representations of texts, and they will be used in three classification task. In addition, we incorporate several types of features to improve the model performance. Our model achieved an F-score of 70.54 (ranked 2/43) in the subtask A and 49.47 (ranked 3/29) in the subtask B. The experimental results validate the effectiveness of our system.

pdf bib
THU_NGN at SemEval-2018 Task 1: Fine-grained Tweet Sentiment Intensity Analysis with Attention CNN-LSTM
Chuhan Wu | Fangzhao Wu | Junxin Liu | Zhigang Yuan | Sixing Wu | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Traditional sentiment analysis approaches mainly focus on classifying the sentiment polarities or emotion categories of texts. However, they can’t exploit the sentiment intensity information. Therefore, the SemEval-2018 Task 1 is aimed to automatically determine the intensity of emotions or sentiment of tweets to mine fine-grained sentiment information. In order to address this task, we propose a system based on an attention CNN-LSTM model. In our model, LSTM is used to extract the long-term contextual information from texts. We apply attention techniques to selecting this information. A CNN layer with different size of kernels is used to extract local features. The dense layers take the pooled CNN feature maps and predict the intensity scores. Our system reaches average Pearson correlation score of 0.722 (ranked 12/48) in emotion intensity regression task, and 0.810 in valence regression task (ranked 15/38). It indicates that our system can be further extended.

pdf bib
THU_NGN at SemEval-2018 Task 2: Residual CNN-LSTM Network with Attention for English Emoji Prediction
Chuhan Wu | Fangzhao Wu | Sixing Wu | Zhigang Yuan | Junxin Liu | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Emojis are widely used by social media and social network users when posting their messages. It is important to study the relationships between messages and emojis. Thus, in SemEval-2018 Task 2 an interesting and challenging task is proposed, i.e., predicting which emojis are evoked by text-based tweets. We propose a residual CNN-LSTM with attention (RCLA) model for this task. Our model combines CNN and LSTM layers to capture both local and long-range contextual information for tweet representation. In addition, attention mechanism is used to select important components. Besides, residual connection is applied to CNN layers to facilitate the training of neural networks. We also incorporated additional features such as POS tags and sentiment features extracted from lexicons. Our model achieved 30.25% macro-averaged F-score in the first subtask (i.e., emoji prediction in English), ranking 7th out of 48 participants.

pdf bib
THU_NGN at SemEval-2018 Task 10: Capturing Discriminative Attributes with MLP-CNN model
Chuhan Wu | Fangzhao Wu | Sixing Wu | Zhigang Yuan | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Existing semantic models are capable of identifying the semantic similarity of words. However, it’s hard for these models to discriminate between a word and another similar word. Thus, the aim of SemEval-2018 Task 10 is to predict whether a word is a discriminative attribute between two concepts. In this task, we apply a multilayer perceptron (MLP)-convolutional neural network (CNN) model to identify whether an attribute is discriminative. The CNNs are used to extract low-level features from the inputs. The MLP takes both the flatten CNN maps and inputs to predict the labels. The evaluation F-score of our system on the test set is 0.629 (ranked 15th), which indicates that our system still needs to be improved. However, the behaviours of our system in our experiments provide useful information, which can help to improve the collective understanding of this novel task.

pdf bib
Neural Metaphor Detecting with CNN-LSTM Model
Chuhan Wu | Fangzhao Wu | Yubo Chen | Sixing Wu | Zhigang Yuan | Yongfeng Huang
Proceedings of the Workshop on Figurative Language Processing

Metaphors are figurative languages widely used in daily life and literatures. It’s an important task to detect the metaphors evoked by texts. Thus, the metaphor shared task is aimed to extract metaphors from plain texts at word level. We propose to use a CNN-LSTM model for this task. Our model combines CNN and LSTM layers to utilize both local and long-range contextual information for identifying metaphorical information. In addition, we compare the performance of the softmax classifier and conditional random field (CRF) for sequential labeling in this task. We also incorporated some additional features such as part of speech (POS) tags and word cluster to improve the performance of model. Our best model achieved 65.06% F-score in the all POS testing subtask and 67.15% in the verbs testing subtask.

pdf bib
Detecting Tweets Mentioning Drug Name and Adverse Drug Reaction with Hierarchical Tweet Representation and Multi-Head Self-Attention
Chuhan Wu | Fangzhao Wu | Junxin Liu | Sixing Wu | Yongfeng Huang | Xing Xie
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

This paper describes our system for the first and third shared tasks of the third Social Media Mining for Health Applications (SMM4H) workshop, which aims to detect the tweets mentioning drug names and adverse drug reactions. In our system we propose a neural approach with hierarchical tweet representation and multi-head self-attention (HTR-MSA) for both tasks. Our system achieved the first place in both the first and third shared tasks of SMM4H with an F-score of 91.83% and 52.20% respectively.

2017

pdf bib
Active Sentiment Domain Adaptation
Fangzhao Wu | Yongfeng Huang | Jun Yan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Domain adaptation is an important technology to handle domain dependence problem in sentiment analysis field. Existing methods usually rely on sentiment classifiers trained in source domains. However, their performance may heavily decline if the distributions of sentiment features in source and target domains have significant difference. In this paper, we propose an active sentiment domain adaptation approach to handle this problem. Instead of the source domain sentiment classifiers, our approach adapts the general-purpose sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode, as well as the domain-specific sentiment similarities among words mined from unlabeled samples of target domain. A unified model is proposed to fuse different types of sentiment information and train sentiment classifier for target domain. Extensive experiments on benchmark datasets show that our approach can train accurate sentiment classifier with less labeled samples.

pdf bib
THU_NGN at IJCNLP-2017 Task 2: Dimensional Sentiment Analysis for Chinese Phrases with Deep LSTM
Chuhan Wu | Fangzhao Wu | Yongfeng Huang | Sixing Wu | Zhigang Yuan
Proceedings of the IJCNLP 2017, Shared Tasks

Predicting valence-arousal ratings for words and phrases is very useful for constructing affective resources for dimensional sentiment analysis. Since the existing valence-arousal resources of Chinese are mainly in word-level and there is a lack of phrase-level ones, the Dimensional Sentiment Analysis for Chinese Phrases (DSAP) task aims to predict the valence-arousal ratings for Chinese affective words and phrases automatically. In this task, we propose an approach using a densely connected LSTM network and word features to identify dimensional sentiment on valence and arousal for words and phrases jointly. We use word embedding as major feature and choose part of speech (POS) and word clusters as additional features to train the dense LSTM network. The evaluation results of our submissions (1st and 2nd in average performance) validate the effectiveness of our system to predict valence and arousal dimensions for Chinese words and phrases.

2016

pdf bib
Sentiment Domain Adaptation with Multiple Sources
Fangzhao Wu | Yongfeng Huang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)