Chakaveh Saedi


2020

pdf bib
The MWN.PT WordNet for Portuguese: Projection, Validation, Cross-lingual Alignment and Distribution
António Branco | Sara Grilo | Márcia Bolrinha | Chakaveh Saedi | Ruben Branco | João Silva | Andreia Querido | Rita de Carvalho | Rosa Gaudio | Mariana Avelãs | Clara Pinto
Proceedings of the 12th Language Resources and Evaluation Conference

The objective of the present paper is twofold, to present the MWN.PT WordNet and to report on its construction and on the lessons learned with it. The MWN.PT WordNet for Portuguese includes 41,000 concepts, expressed by 38,000 lexical units. Its synsets were manually validated and are linked to semantically equivalent synsets of the Princeton WordNet of English, and thus transitively to the many wordnets for other languages that are also linked to this English wordnet. To the best of our knowledge, it is the largest high quality, manually validated and cross-lingually integrated, wordnet of Portuguese distributed for reuse. Its construction was initiated more than one decade ago and its description is published for the first time in the present paper. It follows a three step <projection, validation with alignment, completion> methodology consisting on the manual validation and expansion of the outcome of an automatic projection procedure of synsets and their hypernym relations, followed by another automatic procedure that transferred the relations of remaining semantic types across wordnets of different languages.

pdf bib
Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness
António Branco | João António Rodrigues | Malgorzata Salawa | Ruben Branco | Chakaveh Saedi
Proceedings of the 28th International Conference on Computational Linguistics

Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a cooccurrence vector, thus raising the question: is it the case that one of these approaches is superior to the others in representing lexical semantics appropriately? Or in its non antagonistic counterpart: could there be a unified account of lexical semantics where these approaches seamlessly emerge as (partial) renderings of (different) aspects of a core semantic knowledge base? In this paper, we contribute to these research questions with a number of experiments that systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness. The empirical findings obtained from these experiments advance our insight on lexical semantics as the feature-based approach emerges as superior to the other ones, and arguably also move us closer to finding answers to the research questions above.

pdf bib
Large Scale Author Obfuscation Using Siamese Variational Auto-Encoder: The SiamAO System
Chakaveh Saedi | Mark Dras
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Author obfuscation is the task of masking the author of a piece of text, with applications in privacy. Recent advances in deep neural networks have boosted author identification performance making author obfuscation more challenging. Existing approaches to author obfuscation are largely heuristic. Obfuscation can, however, be thought of as the construction of adversarial examples to attack author identification, suggesting that the deep learning architectures used for adversarial attacks could have application here. Current architectures are proposed to construct adversarial examples against classification-based models, which in author identification would exclude the high-performing similarity-based models employed when facing large number of authorial classes. In this paper, we propose the first deep learning architecture for constructing adversarial examples against similarity-based learners, and explore its application to author obfuscation. We analyse the output from both success in obfuscation and language acceptability, as well as comparing the performance with some common baselines, and showing promising results in finding a balance between safety and soundness of the perturbed texts.

2019

pdf bib
Whom to Learn From? Graph- vs. Text-based Word Embeddings
Małgorzata Salawa | António Branco | Ruben Branco | João António Rodrigues | Chakaveh Saedi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Vectorial representations of meaning can be supported by empirical data from diverse sources and obtained with diverse embedding approaches. This paper aims at screening this experimental space and reports on an assessment of word embeddings supported (i) by data in raw texts vs. in lexical graphs, (ii) by lexical information encoded in association- vs. inference-based graphs, and obtained (iii) by edge reconstruction- vs. matrix factorisation vs. random walk-based graph embedding methods. The results observed with these experiments indicate that the best solutions with graph-based word embeddings are very competitive, consistently outperforming mainstream text-based ones.

2018

pdf bib
Semantic Equivalence Detection: Are Interrogatives Harder than Declaratives?
João Rodrigues | Chakaveh Saedi | António Branco | João Silva
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest
António Branco | Ruben Branco | Chakaveh Saedi | João Silva
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Predicting Brain Activation with WordNet Embeddings
João António Rodrigues | Ruben Branco | João Silva | Chakaveh Saedi | António Branco
Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing

The task of taking a semantic representation of a noun and predicting the brain activity triggered by it in terms of fMRI spatial patterns was pioneered by Mitchell et al. 2008. That seminal work used word co-occurrence features to represent the meaning of the nouns. Even though the task does not impose any specific type of semantic representation, the vast majority of subsequent approaches resort to feature-based models or to semantic spaces (aka word embeddings). We address this task, with competitive results, by using instead a semantic network to encode lexical semantics, thus providing further evidence for the cognitive plausibility of this approach to model lexical meaning.

pdf bib
WordNet Embeddings
Chakaveh Saedi | António Branco | João António Rodrigues | João Silva
Proceedings of The Third Workshop on Representation Learning for NLP

Semantic networks and semantic spaces have been two prominent approaches to represent lexical semantics. While a unified account of the lexical meaning relies on one being able to convert between these representations, in both directions, the conversion direction from semantic networks into semantic spaces started to attract more attention recently. In this paper we present a methodology for this conversion and assess it with a case study. When it is applied over WordNet, the performance of the resulting embeddings in a mainstream semantic similarity task is very good, substantially superior to the performance of word embeddings based on very large collections of texts like word2vec.

2017

pdf bib
Ways of Asking and Replying in Duplicate Question Detection
João António Rodrigues | Chakaveh Saedi | Vladislav Maraev | João Silva | António Branco
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper presents the results of systematic experimentation on the impact in duplicate question detection of different types of questions across both a number of established approaches and a novel, superior one used to address this language processing task. This study permits to gain a novel insight on the different levels of robustness of the diverse detection methods with respect to different conditions of their application, including the ones that approximate real usage scenarios.

2016

pdf bib
Adding syntactic structure to bilingual terminology for improved domain adaptation
Mikel Artetxe | Gorka Labaka | Chakaveh Saedi | João Rodrigues | João Silva | António Branco | Eneko Agirre
Proceedings of the 2nd Deep Machine Translation Workshop