Dina Wonsever


2020

pdf bib
Supervised Hypernymy Detection in Spanish through Order Embeddings
Gun Woo Lee | Mathias Etcheverry | Daniel Fernandez Sanchez | Dina Wonsever
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

This paper addresses the task of supervised hypernymy detection in Spanish through an order embedding and using pretrained word vectors as input. Although the task has been widely addressed in English, there is not much work in Spanish, and according to our knowledge there is not any available dataset for supervised hypernymy detection in Spanish. We built a supervised hypernymy dataset for Spanish from WordNet and corpus statistics information, with different versions according to the lexical intersection between its partitions: random and lexical split. We show the results of using the resulting dataset within an order embedding consuming pretrained word vectors as input. We show the ability of pretrained word vectors to transfer learning to unseen lexical units according to the results in the lexical split dataset. To finish, we study the results of giving additional information in training time, such as, cohyponym links and instances extracted through patterns.

pdf bib
Statistical Deep Parsing for Spanish Using Neural Networks
Luis Chiruzzo | Dina Wonsever
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

This paper presents the development of a deep parser for Spanish that uses a HPSG grammar and returns trees that contain both syntactic and semantic information. The parsing process uses a top-down approach implemented using LSTM neural networks, and achieves good performance results in terms of syntactic constituency and dependency metrics, and also SRL. We describe the grammar, corpus and implementation of the parser. Our process outperforms a CKY baseline and other Spanish parsers in terms of global metrics and also for some specific Spanish phenomena, such as clitics reduplication and relative referents.

2019

pdf bib
Unraveling Antonym’s Word Vectors through a Siamese-like Network
Mathias Etcheverry | Dina Wonsever
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Discriminating antonyms and synonyms is an important NLP task that has the difficulty that both, antonyms and synonyms, contains similar distributional information. Consequently, pairs of antonyms and synonyms may have similar word vectors. We present an approach to unravel antonymy and synonymy from word vectors based on a siamese network inspired approach. The model consists of a two-phase training of the same base network: a pre-training phase according to a siamese model supervised by synonyms and a training phase on antonyms through a siamese-like model that supports the antitransitivity present in antonymy. The approach makes use of the claim that the antonyms in common of a word tend to be synonyms. We show that our approach outperforms distributional and pattern-based approaches, relaying on a simple feed forward network as base network of the training phases.

2018

pdf bib
Spanish HPSG Treebank based on the AnCora Corpus
Luis Chiruzzo | Dina Wonsever
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Factuality Annotation and Learning in Spanish Texts
Dina Wonsever | Aiala Rosá | Marisa Malcuori
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a proposal for the annotation of factuality of event mentions in Spanish texts and a free available annotated corpus. Our factuality model aims to capture a pragmatic notion of factuality, trying to reflect a casual reader judgements about the realis / irrealis status of mentioned events. Also, some learning experiments (SVM and CRF) have been held, showing encouraging results.

pdf bib
Spanish Word Vectors from Wikipedia
Mathias Etcheverry | Dina Wonsever
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Contents analisys from text data requires semantic representations that are difficult to obtain automatically, as they may require large handcrafted knowledge bases or manually annotated examples. Unsupervised autonomous methods for generating semantic representations are of greatest interest in face of huge volumes of text to be exploited in all kinds of applications. In this work we describe the generation and validation of semantic representations in the vector space paradigm for Spanish. The method used is GloVe (Pennington, 2014), one of the best performing reported methods , and vectors were trained over Spanish Wikipedia. The learned vectors evaluation is done in terms of word analogy and similarity tasks (Pennington, 2014; Baroni, 2014; Mikolov, 2013a). The vector set and a Spanish version for some widely used semantic relatedness tests are made publicly available.

2013

pdf bib
Adaptation of a Rule-Based Translator to Río de la Plata Spanish
Ernesto López | Luis Chiruzzo | Dina Wonsever
Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants

2012

pdf bib
Improving Speculative Language Detection using Linguistic Knowledge
Guillermo Moncecchi | Jean-Luc Minel | Dina Wonsever
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

2010

pdf bib
Opinion Identification in Spanish Texts
Aiala Rosá | Dina Wonsever | Jean-Luc Minel
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas