Haim Dubossarsky


2020

pdf bib
SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
Dominik Schlechtweg | Barbara McGillivray | Simon Hengchen | Haim Dubossarsky | Nina Tahmasebi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. We present the results of the first shared task that addresses this gap by providing researchers with an evaluation framework and manually annotated, high-quality datasets for English, German, Latin, and Swedish. 33 teams submitted 186 systems, which were evaluated on two subtasks.

pdf bib
The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures
Haim Dubossarsky | Ivan Vulić | Roi Reichart | Anna Korhonen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Performance in cross-lingual NLP tasks is impacted by the (dis)similarity of languages at hand: e.g., previous work has suggested there is a connection between the expected success of bilingual lexicon induction (BLI) and the assumption of (approximate) isomorphism between monolingual embedding spaces. In this work we present a large-scale study focused on the correlations between monolingual embedding space similarity and task performance, covering thousands of language pairs and four different tasks: BLI, parsing, POS tagging and MT. We hypothesize that statistics of the spectrum of each monolingual embedding space indicate how well they can be aligned. We then introduce several isomorphism measures between two embedding spaces, based on the relevant statistics of their individual spectra. We empirically show that (1) language similarity scores derived from such spectral isomorphism measures are strongly associated with performance observed in different cross-lingual tasks, and (2) our spectral-based measures consistently outperform previous standard isomorphism measures, while being computationally more tractable and easier to interpret. Finally, our measures capture complementary information to typologically driven language distance measures, and the combination of measures from the two families yields even higher task performance correlations.

pdf bib
Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training
Joe Stacey | Pasquale Minervini | Haim Dubossarsky | Sebastian Riedel | Tim Rocktäschel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Belinkov et al. (2019b) proposed tackling this problem via adversarial training, but this can lead to learned sentence representations that still suffer from the same biases. We show that the bias can be reduced in the sentence representations by using an ensemble of adversaries, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data. This approach produces more robust NLI models, outperforming previous de-biasing efforts when generalised to 12 other NLI datasets (Belinkov et al., 2019a; Mahabadi et al., 2020). In addition, we find that the optimal number of adversarial classifiers depends on the dimensionality of the sentence representations, with larger sentence representations being more difficult to de-bias while benefiting from using a greater number of adversaries.

pdf bib
Proceedings of the Second Workshop on Computational Research in Linguistic Typology
Ekaterina Vylomova | Edoardo M. Ponti | Eitan Grossman | Arya D. McCarthy | Yevgeni Berzak | Haim Dubossarsky | Ivan Vulić | Roi Reichart | Anna Korhonen | Ryan Cotterell
Proceedings of the Second Workshop on Computational Research in Linguistic Typology

2019

pdf bib
Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change
Haim Dubossarsky | Simon Hengchen | Nina Tahmasebi | Dominik Schlechtweg
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.

pdf bib
Proceedings of TyP-NLP: The First Workshop on Typology for Polyglot NLP
Haim Dubossarsky | Arya D. McCarthy | Edoardo Maria Ponti | Ivan Vulić | Ekaterina Vylomova | Yevgeni Berzak | Ryan Cotterell | Manaal Faruqui | Anna Korhonen | Roi Reichart
Proceedings of TyP-NLP: The First Workshop on Typology for Polyglot NLP

2018

pdf bib
Coming to Your Senses: on Controls and Evaluation Sets in Polysemy Research
Haim Dubossarsky | Eitan Grossman | Daphna Weinshall
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The point of departure of this article is the claim that sense-specific vectors provide an advantage over normal vectors due to the polysemy that they presumably represent. This claim is based on performance gains observed in gold standard evaluation tests such as word similarity tasks. We demonstrate that this claim, at least as it is instantiated in prior art, is unfounded in two ways. Furthermore, we provide empirical data and an analytic discussion that may account for the previously reported improved performance. First, we show that ground-truth polysemy degrades performance in word similarity tasks. Therefore word similarity tasks are not suitable as an evaluation test for polysemy representation. Second, random assignment of words to senses is shown to improve performance in the same task. This and additional results point to the conclusion that performance gains as reported in previous work may be an artifact of random sense assignment, which is equivalent to sub-sampling and multiple estimation of word vector representations. Theoretical analysis shows that this may on its own be beneficial for the estimation of word similarity, by reducing the bias in the estimation of the cosine distance.

2017

pdf bib
Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models
Haim Dubossarsky | Daphna Weinshall | Eitan Grossman
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This article evaluates three proposed laws of semantic change. Our claim is that in order to validate a putative law of semantic change, the effect should be observed in the genuine condition but absent or reduced in a suitably matched control condition, in which no change can possibly have taken place. Our analysis shows that the effects reported in recent literature must be substantially revised: (i) the proposed negative correlation between meaning change and word frequency is shown to be largely an artefact of the models of word representation used; (ii) the proposed negative correlation between meaning change and prototypicality is shown to be much weaker than what has been claimed in prior art; and (iii) the proposed positive correlation between meaning change and polysemy is largely an artefact of word frequency. These empirical observations are corroborated by analytical proofs that show that count representations introduce an inherent dependence on word frequency, and thus word frequency cannot be evaluated as an independent factor with these representations.