Martin Gleize


pdf bib
HBCP Corpus: A New Resource for the Analysis of Behavioural Change Intervention Reports
Francesca Bonin | Martin Gleize | Ailbhe Finnerty | Candice Moore | Charles Jochim | Emma Norris | Yufang Hou | Alison J. Wright | Debasis Ganguly | Emily Hayes | Silje Zink | Alessandra Pascale | Pol Mac Aonghusa | Susan Michie
Proceedings of the 12th Language Resources and Evaluation Conference

Due to the fast pace at which research reports in behaviour change are published, researchers, consultants and policymakers would benefit from more automatic ways to process these reports. Automatic extraction of the reports’ intervention content, population, settings and their results etc. are essential in synthesising and summarising the literature. However, to the best of our knowledge, no unique resource exists at the moment to facilitate this synthesis. In this paper, we describe the construction of a corpus of published behaviour change intervention evaluation reports aimed at smoking cessation. We also describe and release the annotation of 57 entities, that can be used as an off-the-shelf data resource for tasks such as entity recognition, etc. Both the corpus and the annotation dataset are being made available to the community.


pdf bib
A Summarization System for Scientific Documents
Shai Erera | Michal Shmueli-Scheuer | Guy Feigenblat | Ora Peled Nakash | Odellia Boni | Haggai Roitman | Doron Cohen | Bar Weiner | Yosi Mass | Or Rivlin | Guy Lev | Achiya Jerbi | Jonathan Herzig | Yufang Hou | Charles Jochim | Martin Gleize | Francesca Bonin | Francesca Bonin | David Konopnicki
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization module aims to generate concise yet detailed summaries. We validated our approach with human experts.

pdf bib
Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network
Martin Gleize | Eyal Shnarch | Leshem Choshen | Lena Dankin | Guy Moshkowich | Ranit Aharonov | Noam Slonim
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments. Machines capable of responding and interacting with humans in helpful ways have become ubiquitous. We now expect them to discuss with us the more delicate questions in our world, and they should do so armed with effective arguments. But what makes an argument more persuasive? What will convince you? In this paper, we present a new data set, IBM-EviConv, of pairs of evidence labeled for convincingness, designed to be more challenging than existing alternatives. We also propose a Siamese neural network architecture shown to outperform several baselines on both a prior convincingness data set and our own. Finally, we provide insights into our experimental results and the various kinds of argumentative value our method is capable of detecting.

pdf bib
Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction
Yufang Hou | Charles Jochim | Martin Gleize | Francesca Bonin | Debasis Ganguly
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different areas on different datasets is likely to become increasingly difficult. The community could greatly benefit from an automatic system able to summarize scientific results, e.g., in the form of a leaderboard. In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards. Experiments show that our model outperforms several baselines by a large margin. Our model is a first step towards automatic leaderboard construction, e.g., in the NLP domain.


pdf bib
Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining
Eyal Shnarch | Carlos Alzate | Lena Dankin | Martin Gleize | Yufang Hou | Leshem Choshen | Ranit Aharonov | Noam Slonim
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent evidence detection with two forms of weak labeled data show the advantages of the blending scheme. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection. We believe that blending weak and strong labeled data is a general notion that may be applicable to many language understanding tasks, and can especially assist researchers who wish to train a network but have a small amount of high quality labeled data for their task of interest.


pdf bib
Noyaux de réécriture de phrases munis de types lexico-sémantiques
Martin Gleize | Brigitte Grau
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

De nombreux problèmes en traitement automatique des langues requièrent de déterminer si deux phrases sont des réécritures l’une de l’autre. Une solution efficace consiste à apprendre les réécritures en se fondant sur des méthodes à noyau qui mesurent la similarité entre deux réécritures de paires de phrases. Toutefois, ces méthodes ne permettent généralement pas de prendre en compte des variations sémantiques entre mots, qui permettraient de capturer un plus grand nombre de règles de réécriture. Dans cet article, nous proposons la définition et l’implémentation d’une nouvelle classe de fonction noyau, fondée sur la réécriture de phrases enrichie par un typage pour combler ce manque. Nous l’évaluons sur deux tâches, la reconnaissance de paraphrases et d’implications textuelles.

pdf bib
A Unified Kernel Approach for Learning Typed Sentence Rewritings
Martin Gleize | Brigitte Grau
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)


pdf bib
A hierarchical taxonomy for classifying hardness of inference tasks
Martin Gleize | Brigitte Grau
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Exhibiting inferential capabilities is one of the major goals of many modern Natural Language Processing systems. However, if attempts have been made to define what textual inferences are, few seek to classify inference phenomena by difficulty. In this paper we propose a hierarchical taxonomy for inferences, relatively to their hardness, and with corpus annotation and system design and evaluation in mind. Indeed, a fine-grained assessment of the difficulty of a task allows us to design more appropriate systems and to evaluate them only on what they are designed to handle. Each of seven classes is described and provided with examples from different tasks like question answering, textual entailment and coreference resolution. We then test the classes of our hierarchy on the specific task of question answering. Our annotation process of the testing data at the QA4MRE 2013 evaluation campaign reveals that it is possible to quantify the contrasts in types of difficulty on datasets of the same task.


pdf bib
LIMSIILES: Basic English Substitution for Student Answer Assessment at SemEval 2013
Martin Gleize | Brigitte Grau
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)