Francisco Rangel


2018

pdf bib
LDR at SemEval-2018 Task 3: A Low Dimensional Text Representation for Irony Detection
Bilal Ghanem | Francisco Rangel | Paolo Rosso
Proceedings of The 12th International Workshop on Semantic Evaluation

In this paper we describe our participation in the SemEval-2018 task 3 Shared Task on Irony Detection. We have approached the task with our low dimensionality representation method (LDR), which exploits low dimensional features extracted from text on the basis of the occurrence probability of the words depending on each class. Our intuition is that words in ironic texts have different probability of occurrence than in non-ironic ones. Our approach obtained acceptable results in both subtasks A and B. We have performed an error analysis that shows the difference on correct and incorrect classified tweets.

pdf bib
Cross-corpus Native Language Identification via Statistical Embedding
Francisco Rangel | Paolo Rosso | Julian Brooke | Alexandra Uitdenbogerd
Proceedings of the Second Workshop on Stylistic Variation

In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus. The motivation behind this study is to investigate native language identification in the Australian academic scenario where a majority of students come from China, Indonesia, and Arabic-speaking nations. We have proposed a statistical embedding representation reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and Indonesian in a cross-corpus scenario. The proposed approach was shown to be competitive even when the data is scarce and imbalanced.

pdf bib
Stance Detection in Fake News A Combined Feature Representation
Bilal Ghanem | Paolo Rosso | Francisco Rangel
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

With the uncontrolled increasing of fake news and rumors over the Web, different approaches have been proposed to address the problem. In this paper, we present an approach that combines lexical, word embeddings and n-gram features to detect the stance in fake news. Our approach has been tested on the Fake News Challenge (FNC-1) dataset. Given a news title-article pair, the FNC-1 task aims at determining the relevance of the article and the title. Our proposed approach has achieved an accurate result (59.6 % Macro F1) that is close to the state-of-the-art result with 0.013 difference using a simple feature representation. Furthermore, we have investigated the importance of different lexicons in the detection of the classification labels.

2015

pdf bib
Distributed Representations of Words and Documents for Discriminating Similar Languages
Marc Franco-Salvador | Paolo Rosso | Francisco Rangel
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

pdf bib
NLEL UPV Autoritas Participation at Discrimination between Similar Languages (DSL) 2015 Shared Task
Raül Fabra-Boluda | Francisco Rangel | Paolo Rosso
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects