Anabela Barreiro


2020

pdf bib
Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning
Lionel Nicolas | Verena Lyding | Claudia Borg | Corina Forascu | Karën Fort | Katerina Zdravkova | Iztok Kosem | Jaka Čibej | Špela Arhar Holdt | Alice Millour | Alexander König | Christos Rodosthenous | Federico Sangati | Umair ul Hassan | Anisia Katinskaia | Anabela Barreiro | Lavinia Aparaschivei | Yaakov HaCohen-Kerner
Proceedings of the 12th Language Resources and Evaluation Conference

We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of this generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.

2018

pdf bib
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Peter Machonis | Anabela Barreiro | Kristina Kocijan | Max Silberztein
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

pdf bib
Contractions: To Align or Not to Align, That Is the Question
Anabela Barreiro | Fernando Batista
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and, a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (e.g., [no seio de] [a União Europeia] | [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur in the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (e.g., [no que diz respeito a] | [with regard to] or [além disso] [in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.

pdf bib
Paraphrastic Variance between European and Brazilian Portuguese
Anabela Barreiro | Cristina Mota
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds “toda a gente” versus “todo o mundo” ‘everybody’ or the gerundive constructions [estar a + V-Inf] versus [ficar + V-Ger] (e.g., “estive a observar” | “fiquei observando” ‘I was observing’), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.

2016

pdf bib
Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP
Cristina Mota | Paula Carvalho | Anabela Barreiro
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces Port4NooJ v3.0, the latest version of the Portuguese module for NooJ, highlights its main features, and details its three main new components: (i) a lexicon-grammar based dictionary of 5,177 human intransitive adjectives, and a set of local grammars that use the distributional properties of those adjectives for paraphrasing (ii) a polarity dictionary with 9,031 entries for sentiment analysis, and (iii) a set of priority dictionaries and local grammars for named entity recognition. These new components were derived and/or adapted from publicly available resources. The Port4NooJ v3.0 resource is innovative in terms of the specificity of the linguistic knowledge it incorporates. The dictionary is bilingual Portuguese-English, and the semantico-syntactic information assigned to each entry validates the linguistic relation between the terms in both languages. These characteristics, which cannot be found in any other public resource for Portuguese, make it a valuable resource for translation and paraphrasing. The paper presents the current statistics and describes the different complementary and synergic components and integration efforts.

pdf bib
Machine Translation of Non-Contiguous Multiword Units
Anabela Barreiro | Fernando Batista
Proceedings of the Workshop on Discontinuous Structures in Natural Language Processing

2014

pdf bib
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
Anabela Barreiro | Fernando Batista | Ricardo Ribeiro | Helena Moniz | Isabel Trancoso
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents 3 sets of OpenLogos resources, namely the English-German, the English-French, and the English-Italian bilingual dictionaries. In addition to the usual information on part-of-speech, gender, and number for nouns, offered by most dictionaries currently available, OpenLogos bilingual dictionaries have some distinctive features that make them unique: they contain cross-language morphological information (inflectional and derivational), semantico-syntactic knowledge, indication of the head word in multiword units, information about whether a source word corresponds to an homograph, information about verb auxiliaries, alternate words (i.e., predicate or process nouns), causatives, reflexivity, verb aspect, among others. The focal point of the paper will be the semantico-syntactic knowledge that is important for disambiguation and translation precision. The resources are publicly available at the METANET platform for free use by the research community.

pdf bib
Linguistic Evaluation of Support Verb Constructions by OpenLogos and Google Translate
Anabela Barreiro | Johanna Monti | Brigitte Orliac | Susanne Preuß | Kutz Arrieta | Wang Ling | Fernando Batista | Isabel Trancoso
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a systematic human evaluation of translations of English support verb constructions produced by a rule-based machine translation (RBMT) system (OpenLogos) and a statistical machine translation (SMT) system (Google Translate) for five languages: French, German, Italian, Portuguese and Spanish. We classify support verb constructions by means of their syntactic structure and semantic behavior and present a qualitative analysis of their translation errors. The study aims to verify how machine translation (MT) systems translate fine-grained linguistic phenomena, and how well-equipped they are to produce high-quality translation. Another goal of the linguistically motivated quality analysis of SVC raw output is to reinforce the need for better system hybridization, which leverages the strengths of RBMT to the benefit of SMT, especially in improving the translation of multiword units. Taking multiword units into account, we propose an effective method to achieve MT hybridization based on the integration of semantico-syntactic knowledge into SMT.

2004

pdf bib
On the Problems of Creating a Golden Standard of Inflected Forms in Portuguese
Diana Santos | Anabela Barreiro
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Portuguese Large-scale Language Resources for NLP Applications
Elisabete Ranchhod | Paula Carvalho | Cristina Mota | Anabela Barreiro
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)