Cristina Bosco


2020

pdf bib
Marking Irony Activators in a Universal Dependencies Treebank: The Case of an Italian Twitter Corpus
Alessandra Teresa Cignarella | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso
Proceedings of the 12th Language Resources and Evaluation Conference

The recognition of irony is a challenging task in the domain of Sentiment Analysis, and the availability of annotated corpora may be crucial for its automatic processing. In this paper we describe a fine-grained annotation scheme centered on irony, in which we highlight the tokens that are responsible for its activation, (irony activators) and their morpho-syntactic features. As our case study we therefore introduce a recently released Universal Dependencies treebank for Italian which includes ironic tweets: TWITTIRÒ-UD. For the purposes of this study, we enriched the existing annotation in the treebank, with a further level that includes irony activators. A description and discussion of the annotation scheme is provided with a definition of irony activators and the guidelines for their annotation. This qualitative study on the different layers of annotation applied on the same dataset can shed some light on the process of human annotation, and irony annotation in particular, and on the usefulness of this representation for developing computational models of irony to be used for training purposes.

pdf bib
Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Lauren Cassidy | Özlem Çetinoğlu | Alessandra Teresa Cignarella | Teresa Lynn | Ines Rehbein | Josef Ruppenhofer | Djamé Seddah | Amir Zeldes
Proceedings of the 12th Language Resources and Evaluation Conference

The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

pdf bib
Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella | Valerio Basile | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso | Farah Benamara
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

2019

pdf bib
Presenting TWITTIRÒ-UD: An Italian Twitter Treebank in Universal Dependencies
Alessandra Teresa Cignarella | Cristina Bosco | Paolo Rosso
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter
Valerio Basile | Cristina Bosco | Elisabetta Fersini | Debora Nozza | Viviana Patti | Francisco Manuel Rangel Pardo | Paolo Rosso | Manuela Sanguinetti
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.

pdf bib
UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification
Bilal Ghanem | Alessandra Teresa Cignarella | Cristina Bosco | Paolo Rosso | Francisco Manuel Rangel Pardo
Proceedings of the 13th International Workshop on Semantic Evaluation

In the present paper we describe the UPV-28-UNITO system’s submission to the RumorEval 2019 shared task. The approach we applied for addressing both the subtasks of the contest exploits both classical machine learning algorithms and word embeddings, and it is based on diverse groups of features: stylistic, lexical, emotional, sentiment, meta-structural and Twitter-based. A novel set of features that take advantage of the syntactic information in texts is moreover introduced in the paper.

2018

pdf bib
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alberto Lavelli | Alessandro Mazzei | Oronzo Antonelli | Fabio Tamburini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Italian Twitter Corpus of Hate Speech against Immigrants
Manuela Sanguinetti | Fabio Poletto | Cristina Bosco | Viviana Patti | Marco Stranisci
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ
Alessandra Teresa Cignarella | Cristina Bosco | Viviana Patti | Mirko Lai
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
Jihen Karoui | Farah Benamara | Véronique Moriceau | Viviana Patti | Cristina Bosco | Nathalie Aussenac-Gilles
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter’s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the tweet. Informed by linguistic theories, we propose for the first time a multi-layered annotation schema for irony and its application to a corpus of French, English and Italian tweets. We detail each layer, explore their interactions, and discuss our results according to a qualitative and quantitative perspective.

pdf bib
Annotating Italian Social Media Texts in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alessandro Mazzei | Alberto Lavelli | Fabio Tamburini
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf bib
Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
Cristina Bosco | Mirko Lai | Viviana Patti | Daniela Virone
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper introduces a new annotated French data set for Sentiment Analysis, which is a currently missing resource. It focuses on the collection from Twitter of data related to the socio-political debate about the reform of the bill for wedding in France. The design of the annotation scheme is described, which extends a polarity label set by making available tags for marking target semantic areas and figurative language devices. The annotation process is presented and the disagreement discussed, in particular, in the perspective of figurative language use and in that of the semantic oriented annotation, which are open challenges for NLP systems.

pdf bib
Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola
Marco Stranisci | Cristina Bosco | Delia Irazú Hernández Farías | Viviana Patti
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present the TWitterBuonaScuola corpus (TW-BS), a novel Italian linguistic resource for Sentiment Analysis, developed with the main aim of analyzing the online debate on the controversial Italian political reform “Buona Scuola” (Good school), aimed at reorganizing the national educational and training systems. We describe the methodologies applied in the collection and annotation of data. The collection has been driven by the detection of the hashtags mainly used by the participants to the debate, while the annotation has been focused on sentiment polarity and irony, but also extended to mark the aspects of the reform that were mainly discussed in the debate. An in-depth study of the disagreement among annotators is included. We describe the collection and annotation stages, and the in-depth analysis of disagreement made with Crowdflower, a crowdsourcing annotation platform.

pdf bib
SimpleNLG-IT: adapting SimpleNLG to Italian
Alessandro Mazzei | Cristina Battaglino | Cristina Bosco
Proceedings of the 9th International Natural Language Generation conference

2015

pdf bib
ValenTo: Sentiment Analysis of Figurative Language Tweets with Irony and Sarcasm
Delia Irazú Hernández Farías | Emilio Sulis | Viviana Patti | Giancarlo Ruffo | Cristina Bosco
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Exploiting catenae in a parallel treebank alignment
Manuela Sanguinetti | Cristina Bosco | Loredana Cupi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper aims to introduce the issues related to the syntactic alignment of a dependency-based multilingual parallel treebank, ParTUT. Our approach to the task starts from a lexical mapping and then attempts to expand it using dependency relations. In developing the system, however, we realized that the only dependency relations between the individual nodes were not sufficient to overcome some translation divergences, or shifts, especially in the absence of a direct lexical mapping and a different syntactic realization. For this purpose, we explored the use of a novel syntactic notion introduced in dependency theoretical framework, i.e. that of catena (Latin for “chain”), which is intended as a group of words that are continuous with respect to dominance. In relation to the task of aligning parallel dependency structures, catenae can be used to explain and identify those cases of one-to-many or many-to-many correspondences, typical of several translation shifts, that cannot be detected by means of direct word-based mappings or bare syntactic relations. The paper presented here describes the overall structure of the alignment system as it has been currently designed, how catenae are extracted from the parallel resource, and their potential relevance to the completion of tree alignment in ParTUT sentences.

pdf bib
Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies
Maria Simi | Cristina Bosco | Simonetta Montemagni
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to a reduced dependency tag set. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.

2013

pdf bib
Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank
Cristina Bosco | Simonetta Montemagni | Maria Simi
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Dependency and Constituency in Translation Shift Analysis
Manuela Sanguinetti | Cristina Bosco | Leonardo Lesmo
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib
The Parallel-TUT: a multilingual and multiformat treebank
Cristina Bosco | Manuela Sanguinetti | Leonardo Lesmo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper introduces an ongoing project for the development of a parallel treebank for Italian, English and French, i.e. Parallel--TUT, or simply ParTUT. For the development of this resource, both the dependency and constituency-based formats of the Italian Turin University Treebank (TUT) have been applied to a preliminary dataset, which includes the whole text of the Universal Declaration of Human Rights, and sentences from the JRC-Acquis Multilingual Parallel Corpus and the Creative Commons licence. The focus of the project is mainly on the quality of the annotation and the investigation of some issues related to the alignment of data that can be allowed by the TUT formats, also taking into account the availability of conversion tools for display data in standard ways, such as Tiger--XML and CoNLL formats. It is, in fact, our belief that increasing the portability of our treebank could give us the opportunity to access resources and tools provided by other research groups, especially at this stage of the project, where no particular tool -- compatible with the TUT format -- is available in order to tackle the alignment problems.

pdf bib
A treebank-based study on the influence of Italian word order on parsing performance
Anita Alicante | Cristina Bosco | Anna Corazza | Alberto Lavelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The aim of this paper is to contribute to the debate on the issues raised by Morphologically Rich Languages, and more precisely to investigate, in a cross-paradigm perspective, the influence of the constituent order on the data-driven parsing of one of such languages(i.e. Italian). It shows therefore new evidence from experiments on Italian, a language characterized by a rich verbal inflection, which leads to a widespread diffusion of the pro―drop phenomenon and to a relatively free word order. The experiments are performed by using state-of-the-art data-driven parsers (i.e. MaltParser and Berkeley parser) and are based on an Italian treebank available in formats that vary according to two dimensions, i.e. the paradigm of representation (dependency vs. constituency) and the level of detail of linguistic information.

2011

pdf bib
Building the multilingual TUT parallel treebank
Manuela Sanguinetti | Cristina Bosco
Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora

2010

pdf bib
Comparing the Influence of Different Treebank Annotations on Dependency Parsing
Cristina Bosco | Simonetta Montemagni | Alessandro Mazzei | Vincenzo Lombardo | Felice Dell’Orletta | Alessandro Lenci | Leonardo Lesmo | Giuseppe Attardi | Maria Simi | Alberto Lavelli | Johan Hall | Jens Nilsson | Joakim Nivre
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As the interest of the NLP community grows to develop several treebanks also for languages other than English, we observe efforts towards evaluating the impact of different annotation strategies used to represent particular languages or with reference to particular tasks. This paper contributes to the debate on the influence of resources used for the training and development on the performance of parsing systems. It presents a comparative analysis of the results achieved by three different dependency parsers developed and tested with respect to two treebanks for the Italian language, namely TUT and ISST--TANL, which differ significantly at the level of both corpus composition and adopted dependency representations.

2008

pdf bib
Automatic extraction of subcategorization frames for Italian
Dino Ienco | Serena Villata | Cristina Bosco
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Subcategorization is a kind of knowledge which can be considered as crucial in several NLP tasks, such as Information Extraction or parsing, but the collection of very large resources including subcategorization representation is difficult and time-consuming. Various experiences show that the automatic extraction can be a practical and reliable solution for acquiring such a kind of knowledge. The aim of this paper is to investigate the relationships between subcategorization frame extraction and the nature of data from which the frames have to be extracted, e.g. how much the task can be influenced by the richness/poorness of the annotation. Therefore, we present some experiments that apply statistical subcategorization extraction methods, known in literature, on an Italian treebank that exploits a rich set of dependency relations that can be annotated at different degrees of specificity. Benefiting from the availability of relation sets that implement different granularity in the representation of relations, we evaluate our results with reference to previous works in a cross-linguistic perspective.

pdf bib
Comparing Italian parsers on a common Treebank: the EVALITA experience
Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Giuseppe Attardi | Anna Corazza | Alberto Lavelli | Leonardo Lesmo | Giorgio Satta | Maria Simi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The EVALITA 2007 Parsing Task has been the first contest among parsing systems for Italian. It is the first attempt to compare the approaches and the results of the existing parsing systems specific for this language using a common treebank annotated using both a dependency and a constituency-based format. The development data set for this parsing competition was taken from the Turin University Treebank, which is annotated both in dependency and constituency format. The evaluation metrics were those standardly applied in CoNLL and PARSEVAL. The results of the parsing results are very promising and higher than the state-of-the-art for dependency parsing of Italian. An analysis of such results is provided, which takes into account other experiences in treebank-driven parsing for Italian and for other Romance languages (in particular, the CoNLL X & 2007 shared tasks for dependency parsing). It focuses on the characteristics of data sets, i.e. type of annotation and size, parsing paradigms and approaches applied also to languages other than Italian.

pdf bib
Evaluation of Natural Language Tools for Italian: EVALITA 2007
Bernardo Magnini | Amedeo Cappelli | Fabio Tamburini | Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Francesca Bertagna | Nicoletta Calzolari | Antonio Toral | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Manuela Speranza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants’ systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages.

2007

pdf bib
Multiple-step Treebank Conversion: From Dependency to Penn Format
Cristina Bosco
Proceedings of the Linguistic Annotation Workshop

2006

pdf bib
Comparing linguistic information in treebank annotations
Cristina Bosco | Vincenzo Lombardo
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper investigates the issue of portability of methods and results over treebanks in different languages and annotation formats. In particular, it addresses the problem of converting an Italian treebank, the Turin University Treebank (TUT), developed in dependency format, into the Penn Treebank format, in order to possibly exploit the tools and methods already developed and compare the adequacy of information encoding in the two formats. We describe the procedures for converting the two annotation formats and we present an experiment that evaluates some linguistic knowledge extracted from the two formats, namely sub-categorization frames.

2004

pdf bib
Dependency and relational structure in treebank annotation
Cristina Bosco | Vincenzo Lombardo
Proceedings of the Workshop on Recent Advances in Dependency Grammar

2000

pdf bib
Building a Treebank for Italian: a Data-driven Annotation Schema
Cristina Bosco | Vincenzo Lombardo | Daniela Vassallo | Leonardo Lesmo
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)