Marco Del Tredici


2020

pdf bib
Analysing Lexical Semantic Change with Contextualised Word Representations
Mario Giulianelli | Marco Del Tredici | Raquel Fernández
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper presents the first unsupervised approach to lexical semantic change that makes use of contextualised word representations. We propose a novel method that exploits the BERT neural language model to obtain representations of word usages, clusters these representations into usage types, and measures change along time with three proposed metrics. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements. Our extensive qualitative analysis demonstrates that our method captures a variety of synchronic and diachronic linguistic phenomena. We expect our work to inspire further research in this direction.

pdf bib
Words are the Window to the Soul: Language-based User Representations for Fake News Detection
Marco Del Tredici | Raquel Fernández
Proceedings of the 28th International Conference on Computational Linguistics

Cognitive and social traits of individuals are reflected in language use. Moreover, individuals who are prone to spread fake news online often share common traits. Building on these ideas, we introduce a model that creates representations of individuals on social media based only on the language they produce, and use them to detect fake news. We show that language-based user representations are beneficial for this task. We also present an extended analysis of the language of fake news spreaders, showing that its main features are mostly domain independent and consistent across two English datasets. Finally, we exploit the relation between language use and connections in the social graph to assess the presence of the Echo Chamber effect in our data.

2019

pdf bib
You Shall Know a User by the Company It Keeps: Dynamic Representations for Social Media Users in NLP
Marco Del Tredici | Diego Marcheggiani | Sabine Schulte im Walde | Raquel Fernández
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Information about individuals can help to better understand what they say, particularly in social media where texts are short. Current approaches to modelling social media users pay attention to their social connections, but exploit this information in a static way, treating all connections uniformly. This ignores the fact, well known in sociolinguistics, that an individual may be part of several communities which are not equally relevant in all communicative situations. We present a model based on Graph Attention Networks that captures this observation. It dynamically explores the social graph of a user, computes a user representation given the most relevant connections for a target task, and combines it with linguistic information to make a prediction. We apply our model to three different tasks, evaluate it against alternative models, and analyse the results extensively, showing that it significantly outperforms other current methods.

pdf bib
A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains
Dominik Schlechtweg | Anna Hätty | Marco Del Tredici | Sabine Schulte im Walde
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of domain-specific sense divergences in the field of term extraction.

pdf bib
Short-Term Meaning Shift: A Distributional Exploration
Marco Del Tredici | Raquel Fernández | Gemma Boleda
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present the first exploration of meaning shift over short periods of time in online communities using distributional representations. We create a small annotated dataset and use it to assess the performance of a standard model for meaning shift detection on short-term meaning shift. We find that the model has problems distinguishing meaning shift from referential phenomena, and propose a measure of contextual variability to remedy this.

pdf bib
Abusive Language Detection with Graph Convolutional Networks
Pushkar Mishra | Marco Del Tredici | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Abuse on the Internet represents a significant societal problem of our time. Previous research on automated abusive language detection in Twitter has shown that community-based profiling of users is a promising technique for this task. However, existing approaches only capture shallow properties of online communities by modeling follower–following relationships. In contrast, working with graph convolutional networks (GCNs), we present the first approach that captures not only the structure of online communities but also the linguistic behavior of the users within them. We show that such a heterogeneous graph-structured modeling of communities significantly advances the current state of the art in abusive language detection.

2018

pdf bib
Author Profiling for Abuse Detection
Pushkar Mishra | Marco Del Tredici | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 27th International Conference on Computational Linguistics

The rapid growth of social media in recent years has fed into some highly undesirable phenomena such as proliferation of hateful and offensive language on the Internet. Previous research suggests that such abusive content tends to come from users who share a set of common stereotypes and form communities around them. The current state-of-the-art approaches to abuse detection are oblivious to user and community information and rely entirely on textual (i.e., lexical and semantic) cues. In this paper, we propose a novel approach to this problem that incorporates community-based profiling features of Twitter users. Experimenting with a dataset of 16k tweets, we show that our methods significantly outperform the current state of the art in abuse detection. Further, we conduct a qualitative analysis of model characteristics. We release our code, pre-trained models and all the resources used in the public domain.

pdf bib
The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities
Marco Del Tredici | Raquel Fernández
Proceedings of the 27th International Conference on Computational Linguistics

We investigate the birth and diffusion of lexical innovations in a large dataset of online social communities. We build on sociolinguistic theories and focus on the relation between the spread of a novel term and the social role of the individuals who use it, uncovering characteristics of innovators and adopters. Finally, we perform a prediction task that allows us to anticipate whether an innovation will successfully spread within a community.

2017

pdf bib
Semantic Variation in Online Communities of Practice
Marco Del Tredici | Raquel Fernández
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

2016

pdf bib
Assessing the Potential of Metaphoricity of verbs using corpus data
Marco Del Tredici | Núria Bel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper investigates the relation between metaphoricity and distributional characteristics of verbs, introducing POM, a corpus-derived index that can be used to define the upper bound of metaphoricity of any expression in which a given verb occurs. The work moves from the observation that while some verbs can be used to create highly metaphoric expressions, others can not. We conjecture that this fact is related to the number of contexts in which a verb occurs and to the frequency of each context. This intuition is modelled by introducing a method in which each context of a verb in a corpus is assigned a vector representation, and a clustering algorithm is employed to identify similar contexts. Eventually, the Standard Deviation of the relative frequency values of the clusters is computed and taken as the POM of the target verb. We tested POM in two experimental settings obtaining values of accuracy of 84% and 92%. Since we are convinced, along with (Shutoff, 2015), that metaphor detection systems should be concerned only with the identification of highly metaphoric expressions, we believe that POM could be profitably employed by these systems to a priori exclude expressions that, due to the verb they include, can only have low degrees of metaphoricity

2015

pdf bib
A Word-Embedding-based Sense Index for Regular Polysemy Representation
Marco Del Tredici | Núria Bel
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

2014

pdf bib
A Modular System for Rule-based Text Categorisation
Marco Del Tredici | Malvina Nissim
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce a modular rule-based approach to text categorisation which is more flexible and less time consuming to build than a standard rule-based system because it works with a hierarchical structure and allows for re-usability of rules. When compared to currently more wide-spread machine learning models on a case study, our modular system shows competitive results, and it has the advantage of reducing manual effort over time, since only fewer rules must be written when moving to a (partially) new domain, while annotation of training data is always required in the same amount.