Lu Xiao


pdf bib
syrapropa at SemEval-2020 Task 11: BERT-based Models Design for Propagandistic Technique and Span Detection
Jinfen Li | Lu Xiao
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.

pdf bib
A Lexicon-Based Approach for Detecting Hedges in Informal Text
Jumayel Islam | Lu Xiao | Robert E. Mercer
Proceedings of the 12th Language Resources and Evaluation Conference

Hedging is a commonly used strategy in conversational management to show the speaker’s lack of commitment to what they communicate, which may signal problems between the speakers. Our project is interested in examining the presence of hedging words and phrases in identifying the tension between an interviewer and interviewee during a survivor interview. While there have been studies on hedging detection in the natural language processing literature, all existing work has focused on structured texts and formal communications. Our project thus investigated a corpus of eight unstructured conversational interviews about the Rwanda Genocide and identified hedging patterns in the interviewees’ responses. Our work produced three manually constructed lists of hedge words, booster words, and hedging phrases. Leveraging these lexicons, we developed a rule-based algorithm that detects sentence-level hedges in informal conversations such as survivor interviews. Our work also produced a dataset of 3000 sentences having the categories Hedge and Non-hedge annotated by three researchers. With experiments on this annotated dataset, we verify the efficacy of our proposed algorithm. Our work contributes to the further development of tools that identify hedges from informal conversations and discussions.

pdf bib
TV-AfD: An Imperative-Annotated Corpus from The Big Bang Theory and Wikipedia’s Articles for Deletion Discussions
Yimin Xiao | Zong-Ying Slaton | Lu Xiao
Proceedings of the 12th Language Resources and Evaluation Conference

In this study, we created an imperative corpus with speech conversations from dialogues in The Big Bang Theory and with the written comments in Wikipedia’s Articles for Deletion discussions. For the TV show data, 59 episodes containing 25,076 statements are used. We manually annotated imperatives based on the annotation guideline adapted from Condoravdi and Lauer’s study (2012) and used the retrieved data to assess the performance of syntax-based classification rules. For the Wikipedia AfD comments data, we first developed and leveraged a syntax-based classifier to extract 10,624 statements that may be imperative, and we manually examined the statements and then identified true positives. With this corpus, we also examined the performance of the rule-based imperative detection tool. Our result shows different outcomes for speech (dialogue) and written data. The rule-based classification performs better in the written data in precision (0.80) compared to the speech data (0.44). Also, the rule-based classification has a low-performance overall for speech data with the precision of 0.44, recall of 0.41, and f-1 measure of 0.42. This finding implies the syntax-based model may need to be adjusted for a speech dataset because imperatives in oral communication have greater syntactic varieties and are highly context-dependent.

pdf bib
Effects of Anonymity on Comment Persuasiveness in Wikipedia Articles for Deletion Discussions
Yimin Xiao | Lu Xiao
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

It has been shown that anonymity affects various aspects of online communications such as message credibility, the trust among communicators, and the participants’ accountability and reputation. Anonymity influences social interactions in online communities in these many ways, which can lead to influences on opinion change and the persuasiveness of a message. Prior studies also suggest that the effect of anonymity can vary in different online communication contexts and online communities. In this study, we focus on Wikipedia Articles for Deletion (AfD) discussions as an example of online collaborative communities to study the relationship between anonymity and persuasiveness in this context. We find that in Wikipedia AfD discussions, more identifiable users tend to be more persuasive. The higher persuasiveness can be related to multiple aspects, including linguistic features of the comments, the user’s motivation to participate, persuasive skills the user learns over time, and the user’s identity and credibility established in the community through participation.

pdf bib
Tree Representations in Transition System for RST Parsing
Jinfen Li | Lu Xiao
Proceedings of the 28th International Conference on Computational Linguistics

The transition-based systems in the past studies propose a series of actions, to build a right-heavy binarized tree for the RST parsing. However, the nodes of the binary-nuclear relations (e.g., Contrast) have the same nuclear type with those of the multi-nuclear relations (e.g., Joint) in the binary tree structure. In addition, the reduce action only construct binary trees instead of multi-branch trees, which is the original RST tree structure. In our paper, we design a new nuclear type for the multi-nuclear relations, and a new action to construct a multi-branch tree. We enrich the feature set by extracting additional refined dependency feature of texts from the Bi-Affine model. We also compare the performance of two approaches for RST parsing in the transition-based system: a joint action of reduce-shift and nuclear type (i.e., Reduce-SN) vs a separate one that applies Reduce action first and then assigns nuclear type. We find that the new devised nuclear type and action are more capable of capturing the multi-nuclear relation and the joint action is more suitable than the separate one. Our multi-branch tree structure obtains the state-of-the-art performance for all the 18 coarse relations.


pdf bib
Detection of Propaganda Using Logistic Regression
Jinfen Li | Zhihao Ye | Lu Xiao
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Various propaganda techniques are used to manipulate peoples perspectives in order to foster a predetermined agenda such as by the use of logical fallacies or appealing to the emotions of the audience. In this paper, we develop a Logistic Regression-based tool that automatically classifies whether a sentence is propagandistic or not. We utilize features like TF-IDF, BERT vector, sentence length, readability grade level, emotion feature, LIWC feature and emphatic content feature to help us differentiate these two categories. The linguistic and semantic features combination results in 66.16% of F1 score, which outperforms the baseline hugely.

pdf bib
Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition
Jumayel Islam | Robert E. Mercer | Lu Xiao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The advent of micro-blogging sites has paved the way for researchers to collect and analyze huge volumes of data in recent years. Twitter, being one of the leading social networking sites worldwide, provides a great opportunity to its users for expressing their states of mind via short messages which are called tweets. The urgency of identifying emotions and sentiments conveyed through tweets has led to several research works. It provides a great way to understand human psychology and impose a challenge to researchers to analyze their content easily. In this paper, we propose a novel use of a multi-channel convolutional neural architecture which can effectively use different emotion and sentiment indicators such as hashtags, emoticons and emojis that are present in the tweets and improve the performance of emotion and sentiment identification. We also investigate the incorporation of different lexical features in the neural network model and its effect on the emotion and sentiment identification task. We analyze our model on some standard datasets and compare its effectiveness with existing techniques.


pdf bib
Identification and Disambiguation of Lexical Cues of Rhetorical Relations across Different Text Genres
Taraneh Khazaei | Lu Xiao | Robert Mercer
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics


pdf bib
Extracting Imperatives from Wikipedia Article for Deletion Discussions
Fiona Mao | Robert Mercer | Lu Xiao
Proceedings of the First Workshop on Argumentation Mining

pdf bib
The Use of Text Similarity and Sentiment Analysis to Examine Rationales in the Large-Scale Online Deliberations
Wanting Mao | Lu Xiao | Robert Mercer
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis