Michael Völske


pdf bib
Exploiting Personal Characteristics of Debaters for Predicting Persuasiveness
Khalid Al Khatib | Michael Völske | Shahbaz Syed | Nikolay Kolyada | Benno Stein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Predicting the persuasiveness of arguments has applications as diverse as writing assistance, essay scoring, and advertising. While clearly relevant to the task, the personal characteristics of an argument’s source and audience have not yet been fully exploited toward automated persuasiveness prediction. In this paper, we model debaters’ prior beliefs, interests, and personality traits based on their previous activity, without dependence on explicit user profiles or questionnaires. Using a dataset of over 60,000 argumentative discussions, comprising more than three million individual posts collected from the subreddit r/ChangeMyView, we demonstrate that our modeling of debater’s characteristics enhances the prediction of argument persuasiveness as well as of debaters’ resistance to persuasion.


pdf bib
Towards Summarization for Social Media - Results of the TL;DR Challenge
Shahbaz Syed | Michael Völske | Nedim Lipka | Benno Stein | Hinrich Schütze | Martin Potthast
Proceedings of the 12th International Conference on Natural Language Generation

In this paper, we report on the results of the TL;DR challenge, discussing an extensive manual evaluation of the expected properties of a good summary based on analyzing the comments provided by human annotators.


pdf bib
Task Proposal: The TL;DR Challenge
Shahbaz Syed | Michael Völske | Martin Potthast | Nedim Lipka | Benno Stein | Hinrich Schütze
Proceedings of the 11th International Conference on Natural Language Generation

The TL;DR challenge fosters research in abstractive summarization of informal text, the largest and fastest-growing source of textual data on the web, which has been overlooked by summarization research so far. The challenge owes its name to the frequent practice of social media users to supplement long posts with a “TL;DR”—for “too long; didn’t read”—followed by a short summary as a courtesy to those who would otherwise reply with the exact same abbreviation to indicate they did not care to read a post for its apparent length. Posts featuring TL;DR summaries form an excellent ground truth for summarization, and by tapping into this resource for the first time, we have mined millions of training examples from social media, opening the door to all kinds of generative models.


pdf bib
TL;DR: Mining Reddit to Learn Automatic Summarization
Michael Völske | Martin Potthast | Shahbaz Syed | Benno Stein
Proceedings of the Workshop on New Frontiers in Summarization

Recent advances in automatic text summarization have used deep neural networks to generate high-quality abstractive summaries, but the performance of these models strongly depends on large amounts of suitable training data. We propose a new method for mining social media for author-provided summaries, taking advantage of the common practice of appending a “TL;DR” to long posts. A case study using a large Reddit crawl yields the Webis-TLDR-17 dataset, complementing existing corpora primarily from the news genre. Our technique is likely applicable to other social media sites and general web crawls.


pdf bib
Crowdsourcing Interaction Logs to Understand Text Reuse from the Web
Martin Potthast | Matthias Hagen | Michael Völske | Benno Stein
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)