Natalia Vanetik


2020

pdf bib
Automated Discovery of Mathematical Definitions in Text
Natalia Vanetik | Marina Litvak | Sergey Shevchuk | Lior Reznik
Proceedings of the 12th Language Resources and Evaluation Conference

Automatic definition extraction from texts is an important task that has numerous applications in several natural language processing fields such as summarization, analysis of scientific texts, automatic taxonomy generation, ontology generation, concept identification, and question answering. For definitions that are contained within a single sentence, this problem can be viewed as a binary classification of sentences into definitions and non-definitions. Definitions in scientific literature can be generic (Wikipedia) or more formal (mathematical articles). In this paper, we focus on automatic detection of one-sentence definitions in mathematical texts, which are difficult to separate from surrounding text. We experiment with several data representations, which include sentence syntactic structure and word embeddings, and apply deep learning methods such as convolutional neural network (CNN) and recurrent neural network (RNN), in order to identify mathematical definitions. Our experiments demonstrate the superiority of CNN and its combination with RNN, applied on the syntactically-enriched input representation. We also present a new dataset for definition extraction from mathematical texts. We demonstrate that the use of this dataset for training learning models improves the quality of definition extraction when these models are then used for other definition datasets. Our experiments with different domains approve that mathematical definitions require special treatment, and that using cross-domain learning is inefficient.

pdf bib
SCE-SUMMARY at the FNS 2020 shared task
Marina Litvak | Natalia Vanetik | Zvi Puchinsky
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

With the constantly growing amount of information, the need arises to automatically summarize this written information. One of the challenges in the summary is that it’s difficult to generalize. For example, summarizing a news article is very different from summarizing a financial earnings report. This paper reports an approach for summarizing financial texts, which are different from the documents from other domains at least in three parameters: length, structure, and format. Our approach considers these parameters, it is adapted to hierarchical structure of sections, document length, and special “language”. The approach builds an hierarchical summary, visualized as a tree with summaries under different discourse topics. The approach was evaluated using extrinsic and intrinsic automated evaluations, which are reported in this paper. As all participants of the Financial Narrative Summarisation (FNS 2020) shared task, we used FNS2020 dataset for evaluations.

pdf bib
Hierarchical summarization of financial reports with RUNNER
Marina Litvak | Natalia Vanetik | Zvi Puchinsky
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

With the constantly growing amount of information, the need arises to automatically summarize this written information. One of the challenges in the summary is that it’s difficult to generalize. For example, summarizing a news article is very different from summarizing a financial earnings report. This paper reports an approach for summarizing financial texts, which are different from the documents from other domains at least in three parameters: length, structure, and format. Our approach considers these parameters, it is adapted to hierarchical structure of sections, document length, and special “language”. The approach builds an hierarchical summary, visualized as a tree with summaries under different discourse topics. The approach was evaluated using extrinsic and intrinsic automated evaluations, which are reported in this paper. As all participants of the Financial Narrative Summarisation (FNS 2020) shared task, we used FNS2020 dataset for evaluations.

2019

pdf bib
In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes
Lei Li | Wei Liu | Marina Litvak | Natalia Vanetik | Zuying Huang
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Various Seq2Seq learning models designed for machine translation were applied for abstractive summarization task recently. Despite these models provide high ROUGE scores, they are limited to generate comprehensive summaries with a high level of abstraction due to its degenerated attention distribution. We introduce Diverse Convolutional Seq2Seq Model(DivCNN Seq2Seq) using Determinantal Point Processes methods(Micro DPPs and Macro DPPs) to produce attention distribution considering both quality and diversity. Without breaking the end to end architecture, DivCNN Seq2Seq achieves a higher level of comprehensiveness compared to vanilla models and strong baselines. All the reproducible codes and datasets are available online.

pdf bib
HEvAS: Headline Evaluation and Analysis System
Marina Litvak | Natalia Vanetik | Itzhak Eretz Kdosha
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

Automatic headline generation is a subtask of one-line summarization with many reported applications. Evaluation of systems generating headlines is a very challenging and undeveloped area. We introduce the Headline Evaluation and Analysis System (HEvAS) that performs automatic evaluation of systems in terms of a quality of the generated headlines. HEvAS provides two types of metrics– one which measures the informativeness of a headline, and another that measures its readability. The results of evaluation can be compared to the results of baseline methods which are implemented in HEvAS. The system also performs the statistical analysis of the evaluation results and provides different visualization charts. This paper describes all evaluation metrics, baselines, analysis, and architecture, utilized by our system.

2017

pdf bib
Query-based summarization using MDL principle
Marina Litvak | Natalia Vanetik
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005 2006). It competes with the best results.

2016

pdf bib
What’s up on Twitter? Catch up with TWIST!
Marina Litvak | Natalia Vanetik | Efi Levi | Michael Roistacher
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Event detection and analysis with respect to public opinions and sentiments in social media is a broad and well-addressed research topic. However, the characteristics and sheer volume of noisy Twitter messages make this a difficult task. This demonstration paper describes a TWItter event Summarizer and Trend detector (TWIST) system for event detection, visualization, textual description, and geo-sentiment analysis of real-life events reported in Twitter.

pdf bib
MUSEEC: A Multilingual Text Summarization Tool
Marina Litvak | Natalia Vanetik | Mark Last | Elena Churkin
Proceedings of ACL-2016 System Demonstrations

2015

pdf bib
Krimping texts for better summarization
Marina Litvak | Mark Last | Natalia Vanetik
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multilingual Summarization with Polytope Model
Natalia Vanetik | Marina Litvak
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2013

pdf bib
Mining the Gaps: Towards Polynomial Summarization
Marina Litvak | Natalia Vanetik
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Multilingual Multi-Document Summarization with POLY2
Marina Litvak | Natalia Vanetik
Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization