Aiala Rosá


2020

pdf bib
HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish
Luis Chiruzzo | Santiago Castro | Aiala Rosá
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents the development of a corpus of 30,000 Spanish tweets that were crowd-annotated with humor value and funniness score. The corpus contains approximately 38.6% of humorous tweets with an average score of 2.04 in a scale from 1 to 5 for the humorous tweets. The corpus has been used in an automatic humor recognition and analysis competition, obtaining encouraging results from the participants.

2018

pdf bib
A Crowd-Annotated Spanish Corpus for Humor Analysis
Santiago Castro | Luis Chiruzzo | Aiala Rosá | Diego Garat | Guillermo Moncecchi
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media

Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The inter-annotator agreement Krippendorff’s alpha value is 0.5710. The dataset is available for general usage and can serve as a basis for humor detection and as a first step to tackle subjectivity.

pdf bib
A High Coverage Method for Automatic False Friends Detection for Spanish and Portuguese
Santiago Castro | Jairo Bonanata | Aiala Rosá
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

False friends are words in two languages that look or sound similar, but have different meanings. They are a common source of confusion among language learners. Methods to detect them automatically do exist, however they make use of large aligned bilingual corpora, which are hard to find and expensive to build, or encounter problems dealing with infrequent words. In this work we propose a high coverage method that uses word vector representations to build a false friends classifier for any pair of languages, which we apply to the particular case of Spanish and Portuguese. The required resources are a large corpus for each language and a small bilingual lexicon for the pair.

2016

pdf bib
Factuality Annotation and Learning in Spanish Texts
Dina Wonsever | Aiala Rosá | Marisa Malcuori
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.

2010

pdf bib
Opinion Identification in Spanish Texts
Aiala Rosá | Dina Wonsever | Jean-Luc Minel
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas