Avi Arampatzis


2020

pdf bib
DUTH at SemEval-2020 Task 11: BERT with Entity Mapping for Propaganda Classification
Anastasios Bairaktaris | Symeon Symeonidis | Avi Arampatzis
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This report describes the methods employed by the Democritus University of Thrace (DUTH) team for participating in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. Our team dealt with Subtask 2: Technique Classification. We used shallow Natural Language Processing (NLP) preprocessing techniques to reduce the noise in the dataset, feature selection methods, and common supervised machine learning algorithms. Our final model is based on using the BERT system with entity mapping. To improve our model’s accuracy, we mapped certain words into five distinct categories by employing word-classes and entity recognition

2019

pdf bib
DUTH at SemEval-2019 Task 8: Part-Of-Speech Features for Question Classification
Anastasios Bairaktaris | Symeon Symeonidis | Avi Arampatzis
Proceedings of the 13th International Workshop on Semantic Evaluation

This report describes the methods employed by the Democritus University of Thrace (DUTH) team for participating in SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums. Our team dealt only with Subtask A: Question Classification. Our approach was based on shallow natural language processing (NLP) pre-processing techniques to reduce the noise in data, feature selection methods, and supervised machine learning algorithms such as NearestCentroid, Perceptron, and LinearSVC. To determine the essential features, we were aided by exploratory data analysis and visualizations. In order to improve classification accuracy, we developed a customized list of stopwords, retaining some opinion- and fact-denoting common function words which would have been removed by standard stoplisting. Furthermore, we examined the usefulness of part-of-speech (POS) categories for the task; by trying to remove nouns and adjectives, we found some evidence that verbs are a valuable POS category for the opinion question class.

2018

pdf bib
DUTH at SemEval-2018 Task 2: Emoji Prediction in Tweets
Dimitrios Effrosynidis | Georgios Peikos | Symeon Symeonidis | Avi Arampatzis
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the approach that was developed for SemEval 2018 Task 2 (Multilingual Emoji Prediction) by the DUTH Team. First, we employed a combination of pre-processing techniques to reduce the noise of tweets and produce a number of features. Then, we built several N-grams, to represent the combination of word and emojis. Finally, we trained our system with a tuned LinearSVC classifier. Our approach in the leaderboard ranked 18th amongst 48 teams.

2017

pdf bib
DUTH at SemEval-2017 Task 4: A Voting Classification Approach for Twitter Sentiment Analysis
Symeon Symeonidis | Dimitrios Effrosynidis | John Kordonis | Avi Arampatzis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This report describes our participation to SemEval-2017 Task 4: Sentiment Analysis in Twitter, specifically in subtasks A, B, and C. The approach for text sentiment classification is based on a Majority Vote scheme and combined supervised machine learning methods with classical linguistic resources, including bag-of-words and sentiment lexicon features.

pdf bib
DUTH at SemEval-2017 Task 5: Sentiment Predictability in Financial Microblogging and News Articles
Symeon Symeonidis | John Kordonis | Dimitrios Effrosynidis | Avi Arampatzis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We present the system developed by the team DUTH for the participation in Semeval-2017 task 5 - Fine-Grained Sentiment Analysis on Financial Microblogs and News, in subtasks A and B. Our approach to determine the sentiment of Microblog Messages and News Statements & Headlines is based on linguistic preprocessing, feature engineering, and supervised machine learning techniques. To train our model, we used Neural Network Regression, Linear Regression, Boosted Decision Tree Regression and Decision Forrest Regression classifiers to forecast sentiment scores. At the end, we present an error measure, so as to improve the performance about forecasting methods of the system.

2007

pdf bib
Deriving a Domain Specific Test Collection from a Query Log
Avi Arampatzis | Jaap Kamps | Marijn Koolen | Nir Nussbaum
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).