Idan Szpektor


2020

pdf bib
Semantically Driven Sentence Fusion: Modeling and Evaluation
Eyal Ben-David | Orgad Keller | Eric Malmi | Idan Szpektor | Roi Reichart
Findings of the Association for Computational Linguistics: EMNLP 2020

Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models.

2019

pdf bib
A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy
Genady Beryozkin | Yoel Drori | Oren Gilon | Tzvika Hartman | Idan Szpektor
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study a variant of domain adaptation for named-entity recognition where multiple, heterogeneously tagged training sets are available. Furthermore, the test tag-set is not identical to any individual training tag-set. Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of training tags. This setting occurs when various datasets are created using different annotation schemes. This is also the case of extending a tag-set with a new tag by annotating only the new tag in a new dataset. We propose to use the given tag hierarchy to jointly learn a neural network that shares its tagging layer among all tag-sets. We compare this model to combining independent models and to a model based on the multitasking approach. Our experiments show the benefit of the tag-hierarchy model, especially when facing non-trivial consolidation of tag-sets.

pdf bib
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
Mor Geva | Eric Malmi | Idan Szpektor | Jonathan Berant
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models. In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion. We author a set of rules for identifying a diverse set of discourse phenomena in raw text, and decomposing the text into two independent sentences. We apply our approach on two document collections: Wikipedia and Sports articles, yielding 60 million fusion examples annotated with discourse information required to reconstruct the fused text. We develop a sequence-to-sequence model on DiscoFuse and thoroughly analyze its strengths and weaknesses with respect to the various discourse phenomena, using both automatic as well as human evaluation. Finally, we conduct transfer learning experiments with WebSplit, a recent dataset for text simplification. We show that pretraining on DiscoFuse substantially improves performance on WebSplit when viewed as a sentence fusion task.

pdf bib
Audio De-identification - a New Entity Recognition Task
Ido Cohn | Itay Laish | Genady Beryozkin | Gang Li | Izhak Shafran | Idan Szpektor | Tzvika Hartman | Avinatan Hassidim | Yossi Matias
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it.

2016

pdf bib
Syntactic Parsing of Web Queries with Question Intent
Yuval Pinter | Roi Reichart | Idan Szpektor
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Probabilistic Modeling of Joint-context in Distributional Similarity
Oren Melamud | Ido Dagan | Jacob Goldberger | Idan Szpektor | Deniz Yuret
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

2013

pdf bib
Generating Synthetic Comparable Questions for News Articles
Oleg Rokhlenko | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Two Level Model for Context Sensitive Inference Rules
Oren Melamud | Jonathan Berant | Ido Dagan | Jacob Goldberger | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Using Lexical Expansion to Learn Inference Rules from Sparse Data
Oren Melamud | Ido Dagan | Jacob Goldberger | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Learning Verb Inference Rules from Linguistically-Motivated Evidence
Hila Weisman | Jonathan Berant | Idan Szpektor | Ido Dagan
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Classification-based Contextual Preferences
Shachar Mirkin | Ido Dagan | Lili Kotlerman | Idan Szpektor
Proceedings of the TextInfer 2011 Workshop on Textual Entailment

2010

pdf bib
Textual Entailment
Mark Sammons | Idan Szpektor | V.G.Vinod Vydiswaran
NAACL HLT 2010 Tutorial Abstracts

pdf bib
Generating Entailment Rules from FrameNet
Roni Ben Aharon | Idan Szpektor | Ido Dagan
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Augmenting WordNet-based Inference with Argument Mapping
Idan Szpektor | Ido Dagan
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

pdf bib
Source-Language Entailment Modeling for Translating Unknown Terms
Shachar Mirkin | Lucia Specia | Nicola Cancedda | Ido Dagan | Marc Dymetman | Idan Szpektor
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Directional Distributional Similarity for Lexical Expansion
Lili Kotlerman | Ido Dagan | Idan Szpektor | Maayan Zhitomirsky-Geffet
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
Contextual Preferences
Idan Szpektor | Ido Dagan | Roy Bar-Haim | Jacob Goldberger
Proceedings of ACL-08: HLT

pdf bib
Learning Entailment Rules for Unary Templates
Idan Szpektor | Ido Dagan
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation
Idan Szpektor | Ido Dagan | Alon Lavie | Danny Shacham | Shuly Wintner
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition
Roy Bar-Haim | Ido Dagan | Iddo Greental | Idan Szpektor | Moshe Friedman
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf bib
Instance-based Evaluation of Entailment Rule Acquisition
Idan Szpektor | Eyal Shnarch | Ido Dagan
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Investigating a Generic Paraphrase-Based Approach for Relation Extraction
Lorenza Romano | Milen Kouylekov | Idan Szpektor | Ido Dagan | Alberto Lavelli
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
Definition and Analysis of Intermediate Entailment Levels
Roy Bar-Haim | Idan Szpektor | Oren Glickman
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

2004

pdf bib
Scaling Web-based Acquisition of Entailment Relations
Idan Szpektor | Hristo Tanev | Ido Dagan | Bonaventura Coppola
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing