John Pavlopoulos


2020

pdf bib
Toxicity Detection: Does Context Really Matter?
John Pavlopoulos | Jeffrey Sorensen | Lucas Dixon | Nithum Thain | Ion Androutsopoulos
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Moderation is crucial to promoting healthy online discussions. Although several ‘toxicity’ detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments may be judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improve performance of toxicity detection systems? We experiment with Wikipedia conversations, limiting the notion of context to the previous post in the thread and the discussion title. We find that context can both amplify or mitigate the perceived toxicity of posts. Moreover, a small but significant subset of manually labeled posts (5% in one of our experiments) end up having the opposite toxicity labels if the annotators are not provided with context. Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers, having tried a range of classifiers and mechanisms to make them context aware. This points to the need for larger datasets of comments annotated in context. We make our code and data publicly available.

pdf bib
ERRANT: Assessing and Improving Grammatical Error Type Classification
Katerina Korre | John Pavlopoulos
Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Grammatical Error Correction (GEC) is the task of correcting different types of errors in written texts. To manage this task, large amounts of annotated data that contain erroneous sentences are required. This data, however, is usually annotated according to each annotator’s standards, making it difficult to manage multiple sets of data at the same time. The recently introduced Error Annotation Toolkit (ERRANT) tackled this problem by presenting a way to automatically annotate data that contain grammatical errors, while also providing a standardisation for annotation. ERRANT extracts the errors and classifies them into error types, in the form of an edit that can be used in the creation of GEC systems, as well as for grammatical error analysis. However, we observe that certain errors are falsely or ambiguously classified. This could obstruct any qualitative or quantitative grammatical error type analysis, as the results would be inaccurate. In this work, we use a sample of the FCE coprus (Yannakoudakis et al., 2011) for secondary error type annotation and we show that up to 39% of the annotations of the most frequent type should be re-classified. Our corrections will be publicly released, so that they can serve as the starting point of a broader, collaborative, ongoing correction process.

2019

pdf bib
A Survey on Biomedical Image Captioning
John Pavlopoulos | Vasiliki Kougia | Ion Androutsopoulos
Proceedings of the Second Workshop on Shortcomings in Vision and Language

Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians. This article is the first survey of biomedical image captioning, discussing datasets, evaluation measures, and state of the art methods. Additionally, we suggest two baselines, a weak and a stronger one; the latter outperforms all current state of the art systems on one of the datasets.

pdf bib
ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT
John Pavlopoulos | Nithum Thain | Lucas Dixon | Ion Androutsopoulos
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media. PERSPECTIVE is an API, that serves multiple machine learning models for the improvement of conversations online, as well as a toxicity detection system, trained on a wide variety of comments from platforms across the Internet. BERT is a recently popular language representation model, fine tuned per task and achieving state of the art performance in multiple NLP tasks. PERSPECTIVE performed better than BERT in detecting toxicity, but BERT was much better in categorizing the offensive type. Both baselines were ranked surprisingly high in the SEMEVAL-2019 OFFENSEVAL competition, PERSPECTIVE in detecting an offensive post (12th) and BERT in categorizing it (11th). The main contribution of this paper is the assessment of two strong baselines for the identification (PERSPECTIVE) and the categorization (BERT) of offensive language with little or no additional training data.

2017

pdf bib
Deep Learning for User Comment Moderation
John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the First Workshop on Abusive Language Online

Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of EnglishWikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation.

pdf bib
Improved Abusive Comment Moderation with User Embeddings
John Pavlopoulos | Prodromos Malakasiotis | Juli Bakagianni | Ion Androutsopoulos
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Experimenting with a dataset of approximately 1.6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases. We observe improvements in all cases, with user embeddings leading to the biggest performance gains.

pdf bib
Deeper Attention to Abusive User Content Moderation
John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Experimenting with a new dataset of 1.6M user comments from a news portal and an existing dataset of 115K Wikipedia talk page comments, we show that an RNN operating on word embeddings outpeforms the previous state of the art in moderation, which used logistic regression or an MLP classifier with character or word n-grams. We also compare against a CNN operating on word embeddings, and a word-list baseline. A novel, deep, classificationspecific attention mechanism improves the performance of the RNN further, and can also highlight suspicious words for free, without including highlighted words in the training data. We consider both fully automatic and semi-automatic moderation.

2016

pdf bib
aueb.twitter.sentiment at SemEval-2016 Task 4: A Weighted Ensemble of SVMs for Twitter Sentiment Analysis
Stavros Giorgis | Apostolos Rousas | John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
AUEB-ABSA at SemEval-2016 Task 5: Ensembles of Classifiers and Embeddings for Aspect Based Sentiment Analysis
Dionysios Xenos | Panagiotis Theodorakakos | John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2014

pdf bib
SemEval-2014 Task 4: Aspect Based Sentiment Analysis
Maria Pontiki | Dimitris Galanis | John Pavlopoulos | Harris Papageorgiou | Ion Androutsopoulos | Suresh Manandhar
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
AUEB: Two Stage Sentiment Analysis of Social Network Messages
Rafael Michael Karampatsis | John Pavlopoulos | Prodromos Malakasiotis
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Multi-Granular Aspect Aggregation in Aspect-Based Sentiment Analysis
John Pavlopoulos | Ion Androutsopoulos
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
Panos Alexopoulos | John Pavlopoulos
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Aspect Term Extraction for Sentiment Analysis: New Datasets, New Evaluation Measures and an Improved Unsupervised Method
John Pavlopoulos | Ion Androutsopoulos
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

2013

pdf bib
nlp.cs.aueb.gr: Two Stage Sentiment Analysis
Prodromos Malakasiotis | Rafael Michael Karampatsis | Konstantina Makrynioti | John Pavlopoulos
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)