Lena Dankin


pdf bib
Active Learning for BERT: An Empirical Study
Liat Ein-Dor | Alon Halfon | Ariel Gera | Eyal Shnarch | Lena Dankin | Leshem Choshen | Marina Danilevsky | Ranit Aharonov | Yoav Katz | Noam Slonim
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Real world scenarios present a challenge for text classification, since labels are usually expensive and the data is often characterized by class imbalance. Active Learning (AL) is a ubiquitous paradigm to cope with data scarcity. Recently, pre-trained NLP models, and BERT in particular, are receiving massive attention due to their outstanding performance in various NLP tasks. However, the use of AL with deep pre-trained models has so far received little consideration. Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets. We focus on practical scenarios of binary text classification, where the annotation budget is very small, and the data is often skewed. Our results demonstrate that AL can boost BERT performance, especially in the most realistic scenario in which the initial set of labeled examples is created using keyword-based queries, resulting in a biased sample of the minority class. We release our research framework, aiming to facilitate future research along the lines explored here.


pdf bib
A Dataset of General-Purpose Rebuttal
Matan Orbach | Yonatan Bilu | Ariel Gera | Yoav Kantor | Lena Dankin | Tamar Lavee | Lili Kotlerman | Shachar Mirkin | Michal Jacovi | Ranit Aharonov | Noam Slonim
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension over argumentative content: given a speech on some specified topic, and a list of relevant arguments, the goal is to determine which of the arguments appear in the speech. The general rebuttals we describe here (in English) overcome the need for topic-specific arguments to be provided, by proving to be applicable for a large set of topics. This allows creating responses beyond the scope of topics for which specific arguments are available. All data collected during this work is freely available for research.

pdf bib
Financial Event Extraction Using Wikipedia-Based Weak Supervision
Liat Ein-Dor | Ariel Gera | Orith Toledo-Ronen | Alon Halfon | Benjamin Sznajder | Lena Dankin | Yonatan Bilu | Yoav Katz | Noam Slonim
Proceedings of the Second Workshop on Economics and Natural Language Processing

Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging relevant Wikipedia sections to extract weak labels for sentences describing economic events. Whereas previous weakly supervised approaches required a knowledge-base of such events, or corresponding financial figures, our approach requires no such additional data, and can be employed to extract economic events related to companies which are not even mentioned in the training data.

pdf bib
Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network
Martin Gleize | Eyal Shnarch | Leshem Choshen | Lena Dankin | Guy Moshkowich | Ranit Aharonov | Noam Slonim
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments. Machines capable of responding and interacting with humans in helpful ways have become ubiquitous. We now expect them to discuss with us the more delicate questions in our world, and they should do so armed with effective arguments. But what makes an argument more persuasive? What will convince you? In this paper, we present a new data set, IBM-EviConv, of pairs of evidence labeled for convincingness, designed to be more challenging than existing alternatives. We also propose a Siamese neural network architecture shown to outperform several baselines on both a prior convincingness data set and our own. Finally, we provide insights into our experimental results and the various kinds of argumentative value our method is capable of detecting.

pdf bib
Towards Effective Rebuttal: Listening Comprehension Using Corpus-Wide Claim Mining
Tamar Lavee | Matan Orbach | Lili Kotlerman | Yoav Kantor | Shai Gretz | Lena Dankin | Michal Jacovi | Yonatan Bilu | Ranit Aharonov | Noam Slonim
Proceedings of the 6th Workshop on Argument Mining

Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of 400 speeches in English discussing 200 controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.


pdf bib
Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining
Eyal Shnarch | Carlos Alzate | Lena Dankin | Martin Gleize | Yufang Hou | Leshem Choshen | Ranit Aharonov | Noam Slonim
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent evidence detection with two forms of weak labeled data show the advantages of the blending scheme. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection. We believe that blending weak and strong labeled data is a general notion that may be applicable to many language understanding tasks, and can especially assist researchers who wish to train a network but have a small amount of high quality labeled data for their task of interest.


pdf bib
Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection
Ruty Rinott | Lena Dankin | Carlos Alzate Perez | Mitesh M. Khapra | Ehud Aharoni | Noam Slonim
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing


pdf bib
Claims on demand – an initial demonstration of a system for automatic detection and polarity identification of context dependent claims in massive corpora
Noam Slonim | Ehud Aharoni | Carlos Alzate | Roy Bar-Haim | Yonatan Bilu | Lena Dankin | Iris Eiron | Daniel Hershcovich | Shay Hummel | Mitesh Khapra | Tamar Lavee | Ran Levy | Paul Matchen | Anatoly Polnarov | Vikas Raykar | Ruty Rinott | Amrita Saha | Naama Zwerdling | David Konopnicki | Dan Gutfreund
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations