Jacob Goldberger


2020

pdf bib
A Locally Linear Procedure for Word Translation
Soham Dan | Hagai Taitelbaum | Jacob Goldberger
Proceedings of the 28th International Conference on Computational Linguistics

Learning a mapping between word embeddings of two languages given a dictionary is an important problem with several applications. A common mapping approach is using an orthogonal matrix. The Orthogonal Procrustes Analysis (PA) algorithm can be applied to find the optimal orthogonal matrix. This solution restricts the expressiveness of the translation model which may result in sub-optimal translations. We propose a natural extension of the PA algorithm that uses multiple orthogonal translation matrices to model the mapping and derive an algorithm to learn these multiple matrices. We achieve better performance in a bilingual word translation task and a cross-lingual word similarity task compared to the single matrix baseline. We also show how multiple matrices can model multiple senses of a word.

pdf bib
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
Shauli Ravfogel | Yanai Elazar | Jacob Goldberger | Yoav Goldberg
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a few-shot parsing setting.

2019

pdf bib
Multilingual word translation using auxiliary languages
Hagai Taitelbaum | Gal Chechik | Jacob Goldberger
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current multilingual word translation methods are focused on jointly learning mappings from each language to a shared space. The actual translation, however, is still performed as an isolated bilingual task. In this study we propose a multilingual translation procedure that uses all the learned mappings to translate a word from one language to another. For each source word, we first search for the most relevant auxiliary languages. We then use the translations to these languages to form an improved representation of the source word. Finally, this representation is used for the actual translation to the target language. Experiments on a standard multilingual word translation benchmark demonstrate that our model outperforms state of the art results.

pdf bib
A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation
Hagai Taitelbaum | Gal Chechik | Jacob Goldberger
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper we present a novel approach to simultaneously representing multiple languages in a common space. Procrustes Analysis (PA) is commonly used to find the optimal orthogonal word mapping in the bilingual case. The proposed Multi Pairwise Procrustes Analysis (MPPA) is a natural extension of the PA algorithm to multilingual word mapping. Unlike previous PA extensions that require a k-way dictionary, this approach requires only pairwise bilingual dictionaries that are much easier to construct.

pdf bib
Aligning Vector-spaces with Noisy Supervised Lexicon
Noa Yehezkel Lubin | Jacob Goldberger | Yoav Goldberg
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The problem of learning to translate between two vector spaces given a set of aligned points arises in several application areas of NLP. Current solutions assume that the lexicon which defines the alignment pairs is noise-free. We consider the case where the set of aligned points is allowed to contain an amount of noise, in the form of incorrect lexicon pairs and show that this arises in practice by analyzing the edited dictionaries after the cleaning process. We demonstrate that such noise substantially degrades the accuracy of the learned translation when using current methods. We propose a model that accounts for noisy pairs. This is achieved by introducing a generative model with a compatible iterative EM algorithm. The algorithm jointly learns the noise level in the lexicon, finds the set of noisy pairs, and learns the mapping between the spaces. We demonstrate the effectiveness of our proposed algorithm on two alignment problems: bilingual word embedding translation, and mapping between diachronic embedding spaces for recovering the semantic shifts of words across time periods.

2018

pdf bib
Self-Normalization Properties of Language Modeling
Jacob Goldberger | Oren Melamud
Proceedings of the 27th International Conference on Computational Linguistics

Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.

2017

pdf bib
Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function
Oren Melamud | Jacob Goldberger
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper we define a measure of dependency between two random variables, based on the Jensen-Shannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skip-gram with negative sampling embedding algorithm finds the optimal low-dimensional approximation of this JS dependency measure between the words and their contexts. The gap between the optimal score and the low-dimensional approximation is demonstrated on a standard text corpus.

pdf bib
A Simple Language Model based on PMI Matrix Approximations
Oren Melamud | Ido Dagan | Jacob Goldberger
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this study, we introduce a new approach for learning language models by training them to estimate word-context pointwise mutual information (PMI), and then deriving the desired conditional probabilities from PMI at test time. Specifically, we show that with minor modifications to word2vec’s algorithm, we get principled language models that are closely related to the well-established Noise Contrastive Estimation (NCE) based language models. A compelling aspect of our approach is that our models are trained with the same simple negative sampling objective function that is commonly used in word2vec to learn word embeddings.

2016

pdf bib
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
Oren Melamud | Jacob Goldberger | Ido Dagan
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

2015

pdf bib
Learning to Exploit Structured Resources for Lexical Inference
Vered Shwartz | Omer Levy | Ido Dagan | Jacob Goldberger
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Efficient Global Learning of Entailment Graphs
Jonathan Berant | Noga Alon | Ido Dagan | Jacob Goldberger
Computational Linguistics, Volume 41, Issue 2 - June 2015

pdf bib
Modeling Word Meaning in Context with Substitute Vectors
Oren Melamud | Ido Dagan | Jacob Goldberger
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Focused Entailment Graphs for Open IE Propositions
Omer Levy | Ido Dagan | Jacob Goldberger
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Probabilistic Modeling of Joint-context in Distributional Similarity
Oren Melamud | Ido Dagan | Jacob Goldberger | Idan Szpektor | Deniz Yuret
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

2013

pdf bib
A Two Level Model for Context Sensitive Inference Rules
Oren Melamud | Jonathan Berant | Ido Dagan | Jacob Goldberger | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Using Lexical Expansion to Learn Inference Rules from Sparse Data
Oren Melamud | Ido Dagan | Jacob Goldberger | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
PLIS: a Probabilistic Lexical Inference System
Eyal Shnarch | Erel Segal-haLevi | Jacob Goldberger | Ido Dagan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Learning Entailment Relations by Global Graph Structure Optimization
Jonathan Berant | Ido Dagan | Jacob Goldberger
Computational Linguistics, Volume 38, Issue 1 - March 2012

pdf bib
A Probabilistic Lexical Model for Ranking Textual Inferences
Eyal Shnarch | Ido Dagan | Jacob Goldberger
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Efficient Tree-based Approximation for Entailment Graph Learning
Jonathan Berant | Ido Dagan | Meni Adler | Jacob Goldberger
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Towards a Probabilistic Model for Lexical Entailment
Eyal Shnarch | Jacob Goldberger | Ido Dagan
Proceedings of the TextInfer 2011 Workshop on Textual Entailment

pdf bib
Global Learning of Typed Entailment Rules
Jonathan Berant | Ido Dagan | Jacob Goldberger
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Probabilistic Modeling Framework for Lexical Entailment
Eyal Shnarch | Jacob Goldberger | Ido Dagan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Global Learning of Focused Entailment Graphs
Jonathan Berant | Ido Dagan | Jacob Goldberger
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2008

pdf bib
Contextual Preferences
Idan Szpektor | Ido Dagan | Roy Bar-Haim | Jacob Goldberger
Proceedings of ACL-08: HLT