Federico Nanni


2020

pdf bib
Living Machines: A study of atypical animacy
Mariona Coll Ardanuy | Federico Nanni | Kaspar Beelen | Kasra Hosseini | Ruth Ahnert | Jon Lawrence | Katherine McDonough | Giorgia Tolfo | Daniel CS Wilson | Barbara McGillivray
Proceedings of the 28th International Conference on Computational Linguistics

This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first dataset for atypical animacy detection, based on nineteenth-century sentences in English, with machines represented as either animate or inanimate. Our method builds on recent innovations in language modeling, specifically BERT contextualized word embeddings, to better capture fine-grained contextual properties of words. We present a fully unsupervised pipeline, which can be easily adapted to different contexts, and report its performance on an established animacy dataset and our newly introduced resource. We show that our method provides a substantially more accurate characterization of atypical animacy, especially when applied to highly complex forms of language use.

pdf bib
DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching
Kasra Hosseini | Federico Nanni | Mariona Coll Ardanuy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets.

2019

pdf bib
Policy Preference Detection in Parliamentary Debate Motions
Gavin Abercrombie | Federico Nanni | Riza Batista-Navarro | Simone Paolo Ponzetto
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Debate motions (proposals) tabled in the UK Parliament contain information about the stated policy preferences of the Members of Parliament who propose them, and are key to the analysis of all subsequent speeches given in response to them. We attempt to automatically label debate motions with codes from a pre-existing coding scheme developed by political scientists for the annotation and analysis of political parties’ manifestos. We develop annotation guidelines for the task of applying these codes to debate motions at two levels of granularity and produce a dataset of manually labelled examples. We evaluate the annotation process and the reliability and utility of the labelling scheme, finding that inter-annotator agreement is comparable with that of other studies conducted on manifesto data. Moreover, we test a variety of ways of automatically labelling motions with the codes, ranging from similarity matching to neural classification methods, and evaluate them against the gold standard labels. From these experiments, we note that established supervised baselines are not always able to improve over simple lexical heuristics. At the same time, we detect a clear and evident benefit when employing BERT, a state-of-the-art deep language representation model, even in classification scenarios with over 30 different labels and limited amounts of training data.

pdf bib
Computational Analysis of Political Texts: Bridging Research Efforts Across Communities
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

In the last twenty years, political scientists started adopting and developing natural language processing (NLP) methods more actively in order to exploit text as an additional source of data in their analyses. Over the last decade the usage of computational methods for analysis of political texts has drastically expanded in scope, allowing for a sustained growth of the text-as-data community in political science. In political science, NLP methods have been extensively used for a number of analyses types and tasks, including inferring policy position of actors from textual evidence, detecting topics in political texts, and analyzing stylistic aspects of political texts (e.g., assessing the role of language ambiguity in framing the political agenda). Just like in numerous other domains, much of the work on computational analysis of political texts has been enabled and facilitated by the development of resources such as, the topically coded electoral programmes (e.g., the Manifesto Corpus) or topically coded legislative texts (e.g., the Comparative Agenda Project). Political scientists created resources and used available NLP methods to process textual data largely in isolation from the NLP community. At the same time, NLP researchers addressed closely related tasks such as election prediction, ideology classification, and stance detection. In other words, these two communities have been largely agnostic of one another, with NLP researchers mostly unaware of interesting applications in political science and political scientists not applying cutting-edge NLP methodology to their problems. The main goal of this tutorial is to systematize and analyze the body of research work on political texts from both communities. We aim to provide a gentle, all-round introduction to methods and tasks related to computational analysis of political texts. Our vision is to bring the two research communities closer to each other and contribute to faster and more significant developments in this interdisciplinary research area.

2017

pdf bib
Unsupervised Cross-Lingual Scaling of Political Texts
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence that the output of these models correlates with common political dimensions like left-to-right orientation. Experimental results show that the semantically-informed scaling models better predict the party positions than the existing word-based models in two different political dimensions. Furthermore, the proposed models exhibit no drop in performance in the cross-lingual compared to monolingual setting.

pdf bib
Cross-Lingual Classification of Topics in Political Texts
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the Second Workshop on NLP and Computational Social Science

In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification.

pdf bib
Topic-Based Agreement and Disagreement in US Electoral Manifestos
Stefano Menini | Federico Nanni | Simone Paolo Ponzetto | Sara Tonelli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a topic-based analysis of agreement and disagreement in political manifestos, which relies on a new method for topic detection based on key concept clustering. Our approach outperforms both standard techniques like LDA and a state-of-the-art graph-based method, and provides promising initial results for this new task in computational social science.

2016

pdf bib
Unsupervised Text Segmentation Using Semantic Relatedness Graphs
Goran Glavaš | Federico Nanni | Simone Paolo Ponzetto
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics