Margot Mieskes


2020

pdf bib
Reviewing Natural Language Processing Research
Kevin Cohen | Karën Fort | Margot Mieskes | Aurélie Névéol
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial will cover the theory and practice of reviewing research in natural language processing. Heavy reviewing burdens on natural language processing researchers have made it clear that our community needs to increase the size of our pool of potential reviewers. Simultaneously, notable “false negatives”---rejection by our conferences of work that was later shown to be tremendously important after acceptance by other conferences—have raised awareness of the fact that our reviewing practices leave something to be desired. We do not often talk about “false positives” with respect to conference papers, but leaders in the field have noted that we seem to have a publication bias towards papers that report high performance, with perhaps not much else of interest in them. It need not be this way. Reviewing is a learnable skill, and you will learn it here via lectures and a considerable amount of hands-on practice.

pdf bib
Language Agnostic Automatic Summarization Evaluation
Christopher Tauchmann | Margot Mieskes
Proceedings of the 12th Language Resources and Evaluation Conference

So far work on automatic summarization has dealt primarily with English data. Accordingly, evaluation methods were primarily developed with this language in mind. In our work, we present experiments of adapting available evaluation methods such as ROUGE and PYRAMID to non-English data. We base our experiments on various English and non-English homogeneous benchmark data sets as well as a non-English heterogeneous data set. Our results indicate that ROUGE can indeed be adapted to non-English data – both homogeneous and heterogeneous. Using a recent implementation of performing an automatic PYRAMID evaluation, we also show its adaptability to non-English data.

pdf bib
A Data Set for the Analysis of Text Quality Dimensions in Summarization Evaluation
Margot Mieskes | Eneldo Loza Mencía | Tim Kronsbein
Proceedings of the 12th Language Resources and Evaluation Conference

Automatic evaluation of summarization focuses on developing a metric to represent the quality of the resulting text. However, text qualityis represented in a variety of dimensions ranging from grammaticality to readability and coherence. In our work, we analyze the depen-dencies between a variety of quality dimensions on automatically created multi-document summaries and which dimensions automaticevaluation metrics such as ROUGE, PEAK or JSD are able to capture. Our results indicate that variants of ROUGE are correlated tovarious quality dimensions and that some automatic summarization methods achieve higher quality summaries than others with respectto individual summary quality dimensions. Our results also indicate that differentiating between quality dimensions facilitates inspectionand fine-grained comparison of summarization methods and its characteristics. We make the data from our two summarization qualityevaluation experiments publicly available in order to facilitate the future development of specialized automatic evaluation methods.

2019

bib
OCR Quality and NLP Preprocessing
Margot Mieskes | Stefan Schmunk
Proceedings of the 2019 Workshop on Widening NLP

We present initial experiments to evaluate the performance of tasks such as Part of Speech Tagging on data corrupted by Optical Character Recognition (OCR). Our results, based on English and German data, using artificial experiments as well as initial real OCRed data indicate that already a small drop in OCR quality considerably increases the error rates, which would have a significant impact on subsequent processing steps.

pdf bib
Summarization Evaluation meets Short-Answer Grading
Margot Mieskes | Ulrike Padó
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

pdf bib
Community Perspective on Replicability in Natural Language Processing
Margot Mieskes | Karën Fort | Aurélie Névéol | Cyril Grouin | Kevin Cohen
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

With recent efforts in drawing attention to the task of replicating and/or reproducing results, for example in the context of COLING 2018 and various LREC workshops, the question arises how the NLP community views the topic of replicability in general. Using a survey, in which we involve members of the NLP community, we investigate how our community perceives this topic, its relevance and options for improvement. Based on over two hundred participants, the survey results confirm earlier observations, that successful reproducibility requires more than having access to code and data. Additionally, the results show that the topic has to be tackled from the authors’, reviewers’ and community’s side.

2018

pdf bib
Preparing Data from Psychotherapy for Natural Language Processing
Margot Mieskes | Andreas Stiegelmayr
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data
Christopher Tauchmann | Thomas Arnold | Andreas Hanselowski | Christian M. Meyer | Margot Mieskes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Work Smart - Reducing Effort in Short-Answer Grading
Margot Mieskes | Ulrike Padó
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning

2017

pdf bib
A Quantitative Study of Data in the NLP community
Margot Mieskes
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

We present results on a quantitative analysis of publications in the NLP domain on collecting, publishing and availability of research data. We find that a wide range of publications rely on data crawled from the web, but few give details on how potentially sensitive data was treated. Additionally, we find that while links to repositories of data are given, they often do not work even a short time after publication. We put together several suggestions on how to improve this situation based on publications from the NLP domain, but also other research areas.

2016

pdf bib
Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources
Darina Benikova | Margot Mieskes | Christian M. Meyer | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive summaries, which lack coherence and structure. We use a corpus of heterogeneous documents to address the issue that information seekers usually face – a variety of different types of information sources. We directly extract information from these, but minimally redact and meaningfully order it to form a coherent text. Our qualitative and quantitative evaluations show that quantitative results are not sufficient to judge the quality of a summary and that other quality criteria, such as coherence, should also be taken into account. We find that our manually created corpus is of high quality and that it has the potential to bridge the gap between reference corpora of abstracts and automatic methods producing extracts. Our corpus is available to the research community for further development.

pdf bib
MDSWriter: Annotation Tool for Creating High-Quality Multi-Document Summarization Corpora
Christian M. Meyer | Darina Benikova | Margot Mieskes | Iryna Gurevych
Proceedings of ACL-2016 System Demonstrations

pdf bib
EmpiriST: AIPHES - Robust Tokenization and POS-Tagging for Different Genres
Steffen Remus | Gerold Hintz | Chris Biemann | Christian M. Meyer | Darina Benikova | Judith Eckle-Kohler | Margot Mieskes | Thomas Arnold
Proceedings of the 10th Web as Corpus Workshop

2014

pdf bib
DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement
Christian M. Meyer | Margot Mieskes | Christian Stab | Iryna Gurevych
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

2008

pdf bib
Knowledge Sources for Bridging Resolution in Multi-Party Dialog
Mark-Christoph Mueller | Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.

pdf bib
A Three-stage Disfluency Classifier for Multi Party Dialogues
Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present work on a three-stage system to detect and classify disfluencies in multi party dialogues. The system consists of a regular expression based module and two machine learning based modules. The results are compared to other work on multi party dialogues and we show that our system outperforms previously reported ones.

pdf bib
Parameters for Topic Boundary Detection in Multi-Party Dialogues
Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a topic boundary detection method that searches for connections between sequences of utterances in multi party dialogues. The connections are established based on word identity. We compare our method to a state-of-the art automatic Topic boundary detection method that was also used on multi party dialogues. We checked various methods of preprocessing of the data, including stemming, lemmatization and stopword filtering with a text-based as well as speech-based stopword lists. Using standard evaluation methods we found that our method outperformed the state-of-the art method.

2006

pdf bib
Part-of-Speech Tagging of Transcribed Speech
Margot Mieskes | Michael Strube
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We used four Part-of-Speech taggers, which are available for research purposes and were originally trained on text to tag a corpus of transcribed multiparty spoken dialogues. The assigned tags were then manually corrected. The correction was first used to evaluate the four taggers, then to retrain them. Despite limited resources in time, money and annotators we reached results comparable to those reported for the taggers on text. Based on our experience we present guidelines to produce reliably POS tagged corpora of new domains.