Beatrice Portelli


pdf bib
Distilling the Evidence to Augment Fact Verification Models
Beatrice Portelli | Jason Zhao | Tal Schuster | Giuseppe Serra | Enrico Santus
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)

The alarming spread of fake news in social media, together with the impossibility of scaling manual fact verification, motivated the development of natural language processing techniques to automatically verify the veracity of claims. Most approaches perform a claim-evidence classification without providing any insights about why the claim is trustworthy or not. We propose, instead, a model-agnostic framework that consists of two modules: (1) a span extractor, which identifies the crucial information connecting claim and evidence; and (2) a classifier that combines claim, evidence, and the extracted spans to predict the veracity of the claim. We show that the spans are informative for the classifier, improving performance and robustness. Tested on several state-of-the-art models over the Fever dataset, the enhanced classifiers consistently achieve higher accuracy while also showing reduced sensitivity to artifacts in the claims.

pdf bib
Keyphrase Generation with GANs in Low-Resources Scenarios
Giuseppe Lancioni | Saida S.Mohamed | Beatrice Portelli | Giuseppe Serra | Carlo Tasso
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Keyphrase Generation is the task of predicting Keyphrases (KPs), short phrases that summarize the semantic meaning of a given document. Several past studies provided diverse approaches to generate Keyphrases for an input document. However, all of these approaches still need to be trained on very large datasets. In this paper, we introduce BeGanKP, a new conditional GAN model to address the problem of Keyphrase Generation in a low-resource scenario. Our main contribution relies in the Discriminator’s architecture: a new BERT-based module which is able to distinguish between the generated and humancurated KPs reliably. Its characteristics allow us to use it in a low-resource scenario, where only a small amount of training data are available, obtaining an efficient Generator. The resulting architecture achieves, on five public datasets, competitive results with respect to the state-of-the-art approaches, using less than 1% of the training data.