2020
pdf
bib
abs
A Locally Linear Procedure for Word Translation
Soham Dan

Hagai Taitelbaum

Jacob Goldberger
Proceedings of the 28th International Conference on Computational Linguistics
Learning a mapping between word embeddings of two languages given a dictionary is an important problem with several applications. A common mapping approach is using an orthogonal matrix. The Orthogonal Procrustes Analysis (PA) algorithm can be applied to find the optimal orthogonal matrix. This solution restricts the expressiveness of the translation model which may result in suboptimal translations. We propose a natural extension of the PA algorithm that uses multiple orthogonal translation matrices to model the mapping and derive an algorithm to learn these multiple matrices. We achieve better performance in a bilingual word translation task and a crosslingual word similarity task compared to the single matrix baseline. We also show how multiple matrices can model multiple senses of a word.
pdf
bib
abs
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
Shauli Ravfogel

Yanai Elazar

Jacob Goldberger

Yoav Goldberg
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metriclearning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a fewshot parsing setting.
2019
pdf
bib
abs
Multilingual word translation using auxiliary languages
Hagai Taitelbaum

Gal Chechik

Jacob Goldberger
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP)
Current multilingual word translation methods are focused on jointly learning mappings from each language to a shared space. The actual translation, however, is still performed as an isolated bilingual task. In this study we propose a multilingual translation procedure that uses all the learned mappings to translate a word from one language to another. For each source word, we first search for the most relevant auxiliary languages. We then use the translations to these languages to form an improved representation of the source word. Finally, this representation is used for the actual translation to the target language. Experiments on a standard multilingual word translation benchmark demonstrate that our model outperforms state of the art results.
pdf
bib
abs
A MultiPairwise Extension of Procrustes Analysis for Multilingual Word Translation
Hagai Taitelbaum

Gal Chechik

Jacob Goldberger
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP)
In this paper we present a novel approach to simultaneously representing multiple languages in a common space. Procrustes Analysis (PA) is commonly used to find the optimal orthogonal word mapping in the bilingual case. The proposed Multi Pairwise Procrustes Analysis (MPPA) is a natural extension of the PA algorithm to multilingual word mapping. Unlike previous PA extensions that require a kway dictionary, this approach requires only pairwise bilingual dictionaries that are much easier to construct.
pdf
bib
abs
Aligning Vectorspaces with Noisy Supervised Lexicon
Noa Yehezkel Lubin

Jacob Goldberger

Yoav Goldberg
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
The problem of learning to translate between two vector spaces given a set of aligned points arises in several application areas of NLP. Current solutions assume that the lexicon which defines the alignment pairs is noisefree. We consider the case where the set of aligned points is allowed to contain an amount of noise, in the form of incorrect lexicon pairs and show that this arises in practice by analyzing the edited dictionaries after the cleaning process. We demonstrate that such noise substantially degrades the accuracy of the learned translation when using current methods. We propose a model that accounts for noisy pairs. This is achieved by introducing a generative model with a compatible iterative EM algorithm. The algorithm jointly learns the noise level in the lexicon, finds the set of noisy pairs, and learns the mapping between the spaces. We demonstrate the effectiveness of our proposed algorithm on two alignment problems: bilingual word embedding translation, and mapping between diachronic embedding spaces for recovering the semantic shifts of words across time periods.
2018
pdf
bib
abs
SelfNormalization Properties of Language Modeling
Jacob Goldberger

Oren Melamud
Proceedings of the 27th International Conference on Computational Linguistics
Selfnormalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce runtimes due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling selfnormalization. First, we theoretically analyze the inherent selfnormalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmaxbased approaches, which are selfnormalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between selfnormalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving selfnormalization algorithms in the future.
2017
pdf
bib
abs
InformationTheory Interpretation of the SkipGram NegativeSampling Objective Function
Oren Melamud

Jacob Goldberger
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
In this paper we define a measure of dependency between two random variables, based on the JensenShannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skipgram with negative sampling embedding algorithm finds the optimal lowdimensional approximation of this JS dependency measure between the words and their contexts. The gap between the optimal score and the lowdimensional approximation is demonstrated on a standard text corpus.
pdf
bib
abs
A Simple Language Model based on PMI Matrix Approximations
Oren Melamud

Ido Dagan

Jacob Goldberger
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
In this study, we introduce a new approach for learning language models by training them to estimate wordcontext pointwise mutual information (PMI), and then deriving the desired conditional probabilities from PMI at test time. Specifically, we show that with minor modifications to word2vec’s algorithm, we get principled language models that are closely related to the wellestablished Noise Contrastive Estimation (NCE) based language models. A compelling aspect of our approach is that our models are trained with the same simple negative sampling objective function that is commonly used in word2vec to learn word embeddings.
2016
pdf
bib
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
Oren Melamud

Jacob Goldberger

Ido Dagan
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning
2015
pdf
bib
Learning to Exploit Structured Resources for Lexical Inference
Vered Shwartz

Omer Levy

Ido Dagan

Jacob Goldberger
Proceedings of the Nineteenth Conference on Computational Natural Language Learning
pdf
bib
Efficient Global Learning of Entailment Graphs
Jonathan Berant

Noga Alon

Ido Dagan

Jacob Goldberger
Computational Linguistics, Volume 41, Issue 2  June 2015
pdf
bib
Modeling Word Meaning in Context with Substitute Vectors
Oren Melamud

Ido Dagan

Jacob Goldberger
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2014
pdf
bib
Focused Entailment Graphs for Open IE Propositions
Omer Levy

Ido Dagan

Jacob Goldberger
Proceedings of the Eighteenth Conference on Computational Natural Language Learning
pdf
bib
Probabilistic Modeling of Jointcontext in Distributional Similarity
Oren Melamud

Ido Dagan

Jacob Goldberger

Idan Szpektor

Deniz Yuret
Proceedings of the Eighteenth Conference on Computational Natural Language Learning
2013
pdf
bib
A Two Level Model for Context Sensitive Inference Rules
Oren Melamud

Jonathan Berant

Ido Dagan

Jacob Goldberger

Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Using Lexical Expansion to Learn Inference Rules from Sparse Data
Oren Melamud

Ido Dagan

Jacob Goldberger

Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
pdf
bib
PLIS: a Probabilistic Lexical Inference System
Eyal Shnarch

Erel SegalhaLevi

Jacob Goldberger

Ido Dagan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
2012
pdf
bib
Learning Entailment Relations by Global Graph Structure Optimization
Jonathan Berant

Ido Dagan

Jacob Goldberger
Computational Linguistics, Volume 38, Issue 1  March 2012
pdf
bib
A Probabilistic Lexical Model for Ranking Textual Inferences
Eyal Shnarch

Ido Dagan

Jacob Goldberger
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)
pdf
bib
Efficient Treebased Approximation for Entailment Graph Learning
Jonathan Berant

Ido Dagan

Meni Adler

Jacob Goldberger
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2011
pdf
bib
Towards a Probabilistic Model for Lexical Entailment
Eyal Shnarch

Jacob Goldberger

Ido Dagan
Proceedings of the TextInfer 2011 Workshop on Textual Entailment
pdf
bib
Global Learning of Typed Entailment Rules
Jonathan Berant

Ido Dagan

Jacob Goldberger
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
pdf
bib
A Probabilistic Modeling Framework for Lexical Entailment
Eyal Shnarch

Jacob Goldberger

Ido Dagan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
2010
pdf
bib
Global Learning of Focused Entailment Graphs
Jonathan Berant

Ido Dagan

Jacob Goldberger
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
2008
pdf
bib
Contextual Preferences
Idan Szpektor

Ido Dagan

Roy BarHaim

Jacob Goldberger
Proceedings of ACL08: HLT