Luisa Bentivogli


2020

pdf bib
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
Luisa Bentivogli | Beatrice Savoldi | Matteo Negri | Mattia A. Di Gangi | Roldano Cattoni | Marco Turchi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French).

pdf bib
Breeding Gender-aware Direct Speech Translation Systems
Marco Gaido | Beatrice Savoldi | Luisa Bentivogli | Matteo Negri | Marco Turchi
Proceedings of the 28th International Conference on Computational Linguistics

In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g.speaker’s vocal characteristics) that is otherwise lost in the cascade framework. Although such ability proved to be useful for gender translation, direct ST is nonetheless affected by gender bias just like its cascade counterpart, as well as machine translation and numerous other natural language processing applications. Moreover, direct ST systems that exclusively rely on vocal biometric features as a gender cue can be unsuitable or even potentially problematic for certain users. Going beyond speech signals, in this paper we compare different approaches to inform direct ST models about the speaker’s gender and test their ability to handle gender translation from English into Italian and French. To this aim, we manually annotated large datasets with speak-ers’ gender information and used them for experiments reflecting different possible real-world scenarios. Our results show that gender-aware direct ST solutions can significantly outperform strong – but gender-unaware – direct ST models. In particular, the translation of gender-marked words can increase up to 30 points in accuracy while preserving overall translation quality.

pdf bib
CEF Data Marketplace: Powering a Long-term Supply of Language Data
Amir Kamran | Dace Dzeguze | Jaap van der Meer | Milica Panic | Alessandro Cattelan | Daniele Patrioli | Luisa Bentivogli | Marco Turchi
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We describe the CEF Data Marketplace project, which focuses on the development of a trading platform of translation data for language professionals: translators, machine translation (MT) developers, language service providers (LSPs), translation buyers and government bodies. The CEF Data Marketplace platform will be designed and built to manage and trade data for all languages and domains. This project will open a continuous and longterm supply of language data for MT and other machine learning applications.

2019

pdf bib
Machine Translation for Machines: the Sentiment Classification Use Case
Amirhossein Tebbifakhr | Luisa Bentivogli | Matteo Negri | Marco Turchi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a neural machine translation (NMT) approach that, instead of pursuing adequacy and fluency (“human-oriented” quality criteria), aims to generate translations that are best suited as input to a natural language processing component designed for a specific downstream task (a “machine-oriented” criterion). Towards this objective, we present a reinforcement learning technique based on a new candidate sampling strategy, which exploits the results obtained on the downstream task as weak feedback. Experiments in sentiment classification of Twitter data in German and Italian show that feeding an English classifier with “machine-oriented” translations significantly improves its performance. Classification results outperform those obtained with translations produced by general-purpose NMT models as well as by an approach based on reinforcement learning. Moreover, our results on both languages approximate the classification accuracy computed on gold standard English tweets.

pdf bib
MAGMATic: A Multi-domain Academic Gold Standard with Manual Annotation of Terminology for Machine Translation Evaluation
Randy Scansani | Luisa Bentivogli | Silvia Bernardini | Adriano Ferraresi
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Do translator trainees trust machine translation? An experiment on post-editing and revision
Randy Scansani | Silvia Bernardini | Adriano Ferraresi | Luisa Bentivogli
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
MuST-C: a Multilingual Speech Translation Corpus
Mattia A. Di Gangi | Roldano Cattoni | Luisa Bentivogli | Matteo Negri | Marco Turchi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Current research on spoken language translation (SLT) has to confront with the scarcity of sizeable and publicly available training corpora. This problem hinders the adoption of neural end-to-end approaches, which represent the state of the art in the two parent tasks of SLT: automatic speech recognition and machine translation. To fill this gap, we created MuST-C, a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems for SLT from English into 8 languages. For each target language, MuST-C comprises at least 385 hours of audio recordings from English TED Talks, which are automatically aligned at the sentence level with their manual transcriptions and translations. Together with a description of the corpus creation methodology (scalable to add new data and cover new languages), we provide an empirical verification of its quality and SLT results computed with a state-of-the-art approach on each language direction.

2016

pdf bib
WAGS: A Beautiful English-Italian Benchmark Supporting Word Alignment Evaluation on Rare Words
Luisa Bentivogli | Mauro Cettolo | M. Amin Farajian | Marcello Federico
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents WAGS (Word Alignment Gold Standard), a novel benchmark which allows extensive evaluation of WA tools on out-of-vocabulary (OOV) and rare words. WAGS is a subset of the Common Test section of the Europarl English-Italian parallel corpus, and is specifically tailored to OOV and rare words. WAGS is composed of 6,715 sentence pairs containing 11,958 occurrences of OOV and rare words up to frequency 15 in the Europarl Training set (5,080 English words and 6,878 Italian words), representing almost 3% of the whole text. Since WAGS is focused on OOV/rare words, manual alignments are provided for these words only, and not for the whole sentences. Two off-the-shelf word aligners have been evaluated on WAGS, and results have been compared to those obtained on an existing benchmark tailored to full text alignment. The results obtained confirm that WAGS is a valuable resource, which allows a statistically sound evaluation of WA systems’ performance on OOV and rare words, as well as extensive data analyses. WAGS is publicly released under a Creative Commons Attribution license.

pdf bib
Neural versus Phrase-Based Machine Translation Quality: a Case Study
Luisa Bentivogli | Arianna Bisazza | Mauro Cettolo | Marcello Federico
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
Marco Marelli | Luisa Bentivogli | Marco Baroni | Raffaella Bernardi | Stefano Menini | Roberto Zamparelli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
MT-EQuAl: a Toolkit for Human Assessment of Machine Translation Output
Christian Girardi | Luisa Bentivogli | Mohammad Amin Farajian | Marcello Federico
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
Assessing the Impact of Translation Errors on Machine Translation Quality with Mixed-effects Models
Marcello Federico | Matteo Negri | Luisa Bentivogli | Marco Turchi
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
A SICK cure for the evaluation of compositional distributional semantic models
Marco Marelli | Stefano Menini | Marco Baroni | Luisa Bentivogli | Raffaella Bernardi | Roberto Zamparelli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Shared and internationally recognized benchmarks are fundamental for the development of any computational system. We aim to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them. SICK consists of about 10,000 English sentence pairs that include many examples of the lexical, syntactic and semantic phenomena that CDSMs are expected to account for, but do not require dealing with other aspects of existing sentential data sets (idiomatic multiword expressions, named entities, telegraphic language) that are not within the scope of CDSMs. By means of crowdsourcing techniques, each pair was annotated for two crucial semantic tasks: relatedness in meaning (with a 5-point rating scale as gold score) and entailment relation between the two elements (with three possible gold labels: entailment, contradiction, and neutral). The SICK data set was used in SemEval-2014 Task 1, and it freely available for research purposes.

2013

pdf bib
Semeval-2013 Task 8: Cross-lingual Textual Entailment for Content Synchronization
Matteo Negri | Alessandro Marchetti | Yashar Mehdad | Luisa Bentivogli | Danilo Giampiccolo
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
Myroslava Dzikovska | Rodney Nielsen | Chris Brew | Claudia Leacock | Danilo Giampiccolo | Luisa Bentivogli | Peter Clark | Ido Dagan | Hoa Trang Dang
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Crowd-based MT Evaluation for non-English Target Languages
Michael Paul | Eiichiro Sumita | Luisa Bentivogli | Marcello Federico
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization
Matteo Negri | Alessandro Marchetti | Yashar Mehdad | Luisa Bentivogli | Danilo Giampiccolo
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Chinese Whispers: Cooperative Paraphrase Acquisition
Matteo Negri | Yashar Mehdad | Alessandro Marchetti | Danilo Giampiccolo | Luisa Bentivogli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a framework for the acquisition of sentential paraphrases based on crowdsourcing. The proposed method maximizes the lexical divergence between an original sentence s and its valid paraphrases by running a sequence of paraphrasing jobs carried out by a crowd of non-expert workers. Instead of collecting direct paraphrases of s, at each step of the sequence workers manipulate semantically equivalent reformulations produced in the previous round. We applied this method to paraphrase English sentences extracted from Wikipedia. Our results show that, keeping at each round n the most promising paraphrases (i.e. the more lexically dissimilar from those acquired at round n-1), the monotonic increase of divergence allows to collect good-quality paraphrases in a cost-effective manner.

pdf bib
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico | Sebastian Stüker | Luisa Bentivogli | Michael Paul | Mauro Cettolo | Teresa Herrmann | Jan Niehues | Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.

2011

pdf bib
Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora
Matteo Negri | Luisa Bentivogli | Yashar Mehdad | Danilo Giampiccolo | Alessandro Marchetti
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia
Luisa Bentivogli | Pamela Forner | Claudio Giuliano | Alessandro Marchetti | Emanuele Pianta | Kateryna Tymoshenko
Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

pdf bib
Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference
Luisa Bentivogli | Elena Cabrio | Ido Dagan | Danilo Giampiccolo | Medea Lo Leggio | Bernardo Magnini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes a methodology for the creation of specialized data sets for Textual Entailment, made of monothematic Text-Hypothesis pairs (i.e. pairs in which only one linguistic phenomenon relevant to the entailment relation is highlighted and isolated). The expected benefits derive from the intuition that investigating the linguistic phenomena separately, i.e. decomposing the complexity of the TE problem, would yield an improvement in the development of specific strategies to cope with them. The annotation procedure assumes that humans have knowledge about the linguistic phenomena relevant to inference, and a classification of such phenomena both into fine grained and macro categories is suggested. We experimented with the proposed methodology over a sample of pairs taken from the RTE-5 data set, and investigated critical issues arising when entailment, contradiction or unknown pairs are considered. The result is a new resource, which can be profitably used both to advance the comprehension of the linguistic phenomena relevant to entailment judgments and to make a first step towards the creation of large-scale specialized data sets.

pdf bib
A Resource for Investigating the Impact of Anaphora and Coreference on Inference.
Azad Abad | Luisa Bentivogli | Ido Dagan | Danilo Giampiccolo | Shachar Mirkin | Emanuele Pianta | Asher Stern
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Discourse phenomena play a major role in text processing tasks. However, so far relatively little study has been devoted to the relevance of discourse phenomena for inference. Therefore, an experimental study was carried out to assess the relevance of anaphora and coreference for Textual Entailment (TE), a prominent inference framework. First, the annotation of anaphoric and coreferential links in the RTE-5 Search data set was performed according to a specifically designed annotation scheme. As a result, a new data set was created where all anaphora and coreference instances in the entailing sentences which are relevant to the entailment judgment are solved and annotated.. A by-product of the annotation is a new “augmented” data set, where all the referring expressions which need to be resolved in the entailing sentences are replaced by explicit expressions. Starting from the final output of the annotation, the actual impact of discourse phenomena on inference engines was investigated, identifying the kind of operations that the systems need to apply to address discourse phenomena and trying to find direct mappings between these operation and annotation types.

2006

pdf bib
Representing and Accessing Multilevel Linguistic Annotation using the MEANING Format
Emanuele Pianta | Luisa Bentivogli | Christian Girardi | Bernardo Magnini
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

2004

pdf bib
Evaluating Cross-Language Annotation Transfer in the MultiSemCor Corpus
Luisa Bentivogli | Pamela Forner | Emanuele Pianta
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Knowledge Intensive Word Alignment with KNOWA
Emanuele Pianta | Luisa Bentivogli
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Revising the Wordnet Domains Hierarchy: semantics, coverage and balancing
Luisa Bentivogli | Pamela Forner | Bernardo Magnini | Emanuele Pianta
Proceedings of the Workshop on Multilingual Linguistic Resources

2003

pdf bib
Beyond Lexical Units: Enriching WordNets with Phrasets
Luisa Bentivogli | Emanuele Pianta
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Opportunistic Semantic Tagging
Luisa Bentivogli | Emanuele Pianta
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Coping with Lexical Gaps when Building Aligned Multilingual Wordnets
Luisa Bentivogli | Emanuele Pianta | Fabio Pianesi
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)