Carolina Scarton

Also published as: Carolina Evaristo Scarton


2020

pdf bib
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
Fernando Alva-Manchego | Louis Martin | Antoine Bordes | Carolina Scarton | Benoît Sagot | Lucia Specia
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are evaluated using datasets that are focused on a single transformation, such as lexical paraphrasing or splitting. This makes it impossible to understand the ability of simplification models in more realistic settings. To alleviate this limitation, this paper introduces ASSET, a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations. Through quantitative and qualitative experiments, we show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task. Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.

pdf bib
Measuring the Impact of Readability Features in Fake News Detection
Roney Santos | Gabriela Pedro | Sidney Leal | Oto Vale | Thiago Pardo | Kalina Bontcheva | Carolina Scarton
Proceedings of the 12th Language Resources and Evaluation Conference

The proliferation of fake news is a current issue that influences a number of important areas of society, such as politics, economy and health. In the Natural Language Processing area, recent initiatives tried to detect fake news in different ways, ranging from language-based approaches to content-based verification. In such approaches, the choice of the features for the classification of fake and true news is one of the most important parts of the process. This paper presents a study on the impact of readability features to detect fake news for the Brazilian Portuguese language. The results show that such features are relevant to the task (achieving, alone, up to 92% classification accuracy) and may improve previous classification results.

pdf bib
Data-Driven Sentence Simplification: Survey and Benchmark
Fernando Alva-Manchego | Carolina Scarton | Lucia Specia
Computational Linguistics, Volume 46, Issue 1 - March 2020

Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common data sets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments.

pdf bib
Revisiting Rumour Stance Classification: Dealing with Imbalanced Data
Yue Li | Carolina Scarton
Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM)

Correctly classifying stances of replies can be significantly helpful for the automatic detection and classification of online rumours. One major challenge is that there are considerably more non-relevant replies (comments) than informative ones (supports and denies), making the task highly imbalanced. In this paper we revisit the task of rumour stance classification, aiming to improve the performance over the informative minority classes. We experiment with traditional methods for imbalanced data treatment with feature- and BERT-based classifiers. Our models outperform all systems in RumourEval 2017 shared task and rank second in RumourEval 2019.

2019

pdf bib
EASSE: Easier Automatic Sentence Simplification Evaluation
Fernando Alva-Manchego | Louis Martin | Carolina Scarton | Lucia Specia
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), word-level accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g. TurkCorpus). Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.

bib
Cross-Sentence Transformations in Text Simplification
Fernando Alva-Manchego | Carolina Scarton | Lucia Specia
Proceedings of the 2019 Workshop on Widening NLP

Current approaches to Text Simplification focus on simplifying sentences individually. However, certain simplification transformations span beyond single sentences (e.g. joining and re-ordering sentences). In this paper, we motivate the need for modelling the simplification task at the document level, and assess the performance of sequence-to-sequence neural models in this setup. We analyse parallel original-simplified documents created by professional editors and show that there are frequent rewriting transformations that are not restricted to sentence boundaries. We also propose strategies to automatically evaluate the performance of a simplification model on these cross-sentence transformations. Our experiments show the inability of standard sequence-to-sequence neural models to learn these transformations, and suggest directions towards document-level simplification.

2018

pdf bib
Text Simplification from Professionally Produced Corpora
Carolina Scarton | Gustavo Paetzold | Lucia Specia
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain
Carolina Scarton | Gustavo Paetzold | Lucia Specia
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Learning Simplifications for Specific Target Audiences
Carolina Scarton | Lucia Specia
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text. Most recent work is based on sequence-to-sequence neural models similar to those used for machine translation (MT). Different from MT, TS data comprises more elaborate transformations, such as sentence splitting. It can also contain multiple simplifications of the same original text targeting different audiences, such as school grade levels. We explore these two features of TS to build models tailored for specific grade levels. Our approach uses a standard sequence-to-sequence architecture where the original sequence is annotated with information about the target audience and/or the (predicted) type of simplification operation. We show that it outperforms state-of-the-art TS approaches (up to 3 and 12 BLEU and SARI points, respectively), including when training data for the specific complex-simple combination of grade levels is not available, i.e. zero-shot learning.

pdf bib
Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting
Mikel L. Forcada | Carolina Scarton | Lucia Specia | Barry Haddow | Alexandra Birch
Proceedings of the Third Conference on Machine Translation: Research Papers

A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.

pdf bib
Sheffield Submissions for WMT18 Multimodal Translation Shared Task
Chiraag Lala | Pranava Swaroop Madhyastha | Carolina Scarton | Lucia Specia
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the University of Sheffield’s submissions to the WMT18 Multimodal Machine Translation shared task. We participated in both tasks 1 and 1b. For task 1, we build on a standard sequence to sequence attention-based neural machine translation system (NMT) and investigate the utility of multimodal re-ranking approaches. More specifically, n-best translation candidates from this system are re-ranked using novel multimodal cross-lingual word sense disambiguation models. For task 1b, we explore three approaches: (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.

pdf bib
Sheffield Submissions for the WMT18 Quality Estimation Shared Task
Julia Ive | Carolina Scarton | Frédéric Blain | Lucia Specia
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task. We discuss our submissions to all four sub-tasks, where ours is the only team to participate in all language pairs and variations (37 combinations). Our systems show competitive results and outperform the baseline in nearly all cases.

2017

pdf bib
Improving Evaluation of Document-level Machine Translation Quality Estimation
Yvette Graham | Qingsong Ma | Timothy Baldwin | Qun Liu | Carla Parra | Carolina Scarton
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable. In this paper, we explore the validity of human annotations currently employed in the evaluation of document-level quality estimation for machine translation (MT). We demonstrate the degree to which MT system rankings are dependent on weights employed in the construction of the gold standard, before proposing direct human assessment as a valid alternative. Experiments show direct assessment (DA) scores for documents to be highly reliable, achieving a correlation of above 0.9 in a self-replication experiment, in addition to a substantial estimated cost reduction through quality controlled crowd-sourcing. The original gold standard based on post-edits incurs a 10–20 times greater cost than DA.

pdf bib
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
Fernando Alva-Manchego | Joachim Bingel | Gustavo Paetzold | Carolina Scarton | Lucia Specia
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus has alleviated the first problem, simplifications still need to be learned directly from parallel text using black-box, end-to-end approaches rather than from explicit annotations. These complex-simple parallel sentence pairs often differ to such a high degree that generalization becomes difficult. End-to-end models also make it hard to interpret what is actually learned from data. We propose a method that decomposes the task of TS into its sub-problems. We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations. Finally, we provide insights on the types of transformations that different approaches can model.

pdf bib
MUSST: A Multilingual Syntactic Simplification Tool
Carolina Scarton | Alessio Palmero Aprosio | Sara Tonelli | Tamara Martín Wanton | Lucia Specia
Proceedings of the IJCNLP 2017, System Demonstrations

We describe MUSST, a multilingual syntactic simplification tool. The tool supports sentence simplifications for English, Italian and Spanish, and can be easily extended to other languages. Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications). The tool was implemented in the context of the European project SIMPATICO on text simplification for Public Administration (PA) texts. Our evaluation on sentences in the PA domain shows that we obtain correct simplifications for 76% of the simplified cases in English, 71% of the cases in Spanish. For Italian, the results are lower (38%) but the tool is still under development.

pdf bib
Bilexical Embeddings for Quality Estimation
Frédéric Blain | Carolina Scarton | Lucia Specia
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
A Reading Comprehension Corpus for Machine Translation Evaluation
Carolina Scarton | Lucia Specia
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Effectively assessing Natural Language Processing output tasks is a challenge for research in the area. In the case of Machine Translation (MT), automatic metrics are usually preferred over human evaluation, given time and budget constraints.However, traditional automatic metrics (such as BLEU) are not reliable for absolute quality assessment of documents, often producing similar scores for documents translated by the same MT system.For scenarios where absolute labels are necessary for building models, such as document-level Quality Estimation, these metrics can not be fully trusted. In this paper, we introduce a corpus of reading comprehension tests based on machine translated documents, where we evaluate documents based on answers to questions by fluent speakers of the target language. We describe the process of creating such a resource, the experiment design and agreement between the test takers. Finally, we discuss ways to convert the reading comprehension test into document-level quality scores.

pdf bib
Quality Estimation for Language Output Applications
Carolina Scarton | Gustavo Paetzold | Lucia Specia
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

Quality Estimation (QE) of language output applications is a research area that has been attracting significant attention. The goal of QE is to estimate the quality of language output applications without the need of human references. Instead, machine learning algorithms are used to build supervised models based on a few labelled training instances. Such models are able to generalise over unseen data and thus QE is a robust method applicable to scenarios where human input is not available or possible. One such a scenario where QE is particularly appealing is that of Machine Translation, where a score for predicted quality can help decide whether or not a translation is useful (e.g. for post-editing) or reliable (e.g. for gisting). Other potential applications within Natural Language Processing (NLP) include Text Summarisation and Text Simplification. In this tutorial we present the task of QE and its application in NLP, focusing on Machine Translation. We also introduce QuEst++, a toolkit for QE that encompasses feature extraction and machine learning, and propose a practical activity to extend this toolkit in various ways.

pdf bib
Findings of the 2016 Conference on Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Martin Popel | Matt Post | Raphael Rubino | Carolina Scarton | Lucia Specia | Marco Turchi | Karin Verspoor | Marcos Zampieri
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Word embeddings and discourse information for Quality Estimation
Carolina Scarton | Daniel Beck | Kashif Shah | Karin Sim Smith | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles
Liling Tan | Carolina Scarton | Lucia Specia | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation
Carolina Scarton | Marcos Zampieri | Mihaela Vela | Josef van Genabith | Lucia Specia
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Findings of the 2015 Workshop on Statistical Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Barry Haddow | Matthias Huck | Chris Hokamp | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Matt Post | Carolina Scarton | Lucia Specia | Marco Turchi
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
USHEF and USAAR-USHEF participation in the WMT15 QE shared task
Carolina Scarton | Liling Tan | Lucia Specia
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation
Carolina Scarton | Marcos Zampieri | Mihaela Vela | Josef van Genabith | Lucia Specia
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Discourse and Document-level Information for Evaluating Language Output Tasks
Carolina Scarton
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

pdf bib
Multi-level Translation Quality Prediction with QuEst++
Lucia Specia | Gustavo Paetzold | Carolina Scarton
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics
Liling Tan | Carolina Scarton | Lucia Specia | Josef van Genabith
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Exploring Consensus in Machine Translation for Quality Estimation
Carolina Scarton | Lucia Specia
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Document-level translation quality estimation: exploring discourse and pseudo-references
Carolina Scarton | Lucia Specia
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2013

pdf bib
Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic ‘se’ in Portuguese
Magali Sanches Duran | Carolina Evaristo Scarton | Sandra Maria Aluísio | Carlos Ramisch
Proceedings of the 9th Workshop on Multiword Expressions

2011

pdf bib
VerbNet.Br: construção semiautomática de um léxico computacional de verbos para o português do Brasil (VerbNet.Br: semiautomatic construction of a computational verb lexicon for Brazilian Portuguese) [in Portuguese]
Carolina Evaristo Scarton
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

pdf bib
Comparando Avaliações de Inteligibilidade Textual entre Originais e Traduções de Textos Literários (Comparing Textual Intelligibility Evaluations among Literary Source Texts and their Translations) [in Portuguese]
Bianca Franco Pasqualini | Carolina Evaristo Scarton | Maria José B. Finatto
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

pdf bib
Características do jornalismo popular: avaliação da inteligibilidade e auxílio à descrição do gênero (Characteristics of Popular News: the Evaluation of Intelligibility and Support to the Genre Description) [in Portuguese]
Maria José B. Finatto | Carolina Evaristo Scarton | Amanda Rocha | Sandra Aluísio
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

2010

pdf bib
SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments
Carolina Scarton | Matheus Oliveira | Arnaldo Candido Jr. | Caroline Gasperin | Sandra Aluísio
Proceedings of the NAACL HLT 2010 Demonstration Session

pdf bib
Readability Assessment for Text Simplification
Sandra Aluisio | Lucia Specia | Caroline Gasperin | Carolina Scarton
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications