Manfred Stede


2020

pdf bib
Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
Henny Sluyter-Gäthje | Peter Bourgonje | Manfred Stede
Proceedings of the 12th Language Resources and Evaluation Conference

Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.

pdf bib
The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing
Peter Bourgonje | Manfred Stede
Proceedings of the 12th Language Resources and Evaluation Conference

We present the Potsdam Commentary Corpus 2.2, a German corpus of news editorials annotated on several different levels. New in the 2.2 version of the corpus are two additional annotation layers for coherence relations following the Penn Discourse TreeBank framework. Specifically, we add relation senses to an already existing layer of discourse connectives and their arguments, and we introduce a new layer with additional coherence relation types, resulting in a German corpus that mirrors the PDTB. The aim of this is to increase usability of the corpus for the task of shallow discourse parsing. In this paper, we provide inter-annotator agreement figures for the new annotations and compare corpus statistics based on the new annotations to the equivalent statistics extracted from the PDTB.

pdf bib
DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives
Debopam Das | Manfred Stede | Soumya Sankar Ghosh | Lahari Chatterjee
Proceedings of the 12th Language Resources and Evaluation Conference

We present DiMLex-Bangla, a newly developed lexicon of discourse connectives in Bangla. The lexicon, upon completion of its first version, contains 123 Bangla connective entries, which are primarily compiled from the linguistic literature and translation of English discourse connectives. The lexicon compilation is later augmented by adding more connectives from a currently developed corpus, called the Bangla RST Discourse Treebank (Das and Stede, 2018). DiMLex-Bangla provides information on syntactic categories of Bangla connectives, their discourse semantics and non-connective uses (if any). It uses the format of the German connective lexicon DiMLex (Stede and Umbach, 1998), which provides a cross-linguistically applicable XML schema. The resource is the first of its kind in Bangla, and is freely available for use in studies on discourse structure and computational applications.

pdf bib
Semi-Supervised Tri-Training for Explicit Discourse Argument Expansion
René Knaebel | Manfred Stede
Proceedings of the 12th Language Resources and Evaluation Conference

This paper describes a novel application of semi-supervision for shallow discourse parsing. We use a neural approach for sequence tagging and focus on the extraction of explicit discourse arguments. First, additional unlabeled data is prepared for semi-supervised learning. From this data, weak annotations are generated in a first setting and later used in another setting to study performance differences. In our studies, we show an increase in the performance of our models that ranges between 2-10% F1 score. Further, we give some insights to the generated discourse annotations and compare the developed additional relations with the training relations. We release this new dataset of explicit discourse arguments to enable the training of large statistical models.

pdf bib
Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş | Veronika Solopova | Annalena Kohnert | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020

The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.

pdf bib
Contextualized Embeddings for Connective Disambiguation in Shallow Discourse Parsing
René Knaebel | Manfred Stede
Proceedings of the First Workshop on Computational Approaches to Discourse

This paper studies a novel model that simplifies the disambiguation of connectives for explicit discourse relations. We use a neural approach that integrates contextualized word embeddings and predicts whether a connective candidate is part of a discourse relation or not. We study the influence of those context-specific embeddings. Further, we show the benefit of training the tasks of connective disambiguation and sense classification together at the same time. The success of our approach is supported by state-of-the-art results.

pdf bib
Exploiting a lexical resource for discourse connective disambiguation in German
Peter Bourgonje | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we focus on connective identification and sense classification for explicit discourse relations in German, as two individual sub-tasks of the overarching Shallow Discourse Parsing task. We successively augment a purely-empirical approach based on contextualised embeddings with linguistic knowledge encoded in a connective lexicon. In this way, we improve over published results for connective identification, achieving a final F1-score of 87.93; and we introduce, to the best of our knowledge, first results for German sense classification, achieving an F1-score of 87.13. Our approach demonstrates that a connective lexicon can be a valuable resource for those languages that do not have a large PDTB-style-annotated coprus available.

pdf bib
Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of “behavior” for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken–written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.

pdf bib
Annotation and Detection of Arguments in Tweets
Robin Schaefer | Manfred Stede
Proceedings of the 7th Workshop on Argument Mining

Notwithstanding the increasing role Twitter plays in modern political and social discourse, resources built for conducting argument mining on tweets remain limited. In this paper, we present a new corpus of German tweets annotated for argument components. To the best of our knowledge, this is the first corpus containing not only annotated full tweets but also argumentative spans within tweets. We further report first promising results using supervised classification (F1: 0.82) and sequence labeling (F1: 0.72) approaches.

2019

pdf bib
Window-Based Neural Tagging for Shallow Discourse Argument Labeling
René Knaebel | Manfred Stede | Sebastian Stober
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

This paper describes a novel approach for the task of end-to-end argument labeling in shallow discourse parsing. Our method describes a decomposition of the overall labeling task into subtasks and a general distance-based aggregation procedure. For learning these subtasks, we train a recurrent neural network and gradually replace existing components of our baseline by our model. The model is trained and evaluated on the Penn Discourse Treebank 2 corpus. While it is not as good as knowledge-intense approaches, it clearly outperforms other models that are also trained without additional linguistic features.

pdf bib
Automated Cross-language Intelligibility Analysis of Parkinson’s Disease Patients Using Speech Recognition Technologies
Nina Hosseini-Kivanani | Juan Camilo Vásquez-Correa | Manfred Stede | Elmar Nöth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Speech deficits are common symptoms amongParkinson’s Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently, progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD patients. In the present study, we plan to analyze the speech signals of PD patients and healthy control (HC) subjects in three different languages: German, Spanish, and Czech, with the aim to identify biomarkers to discriminate between PD patients and HC subjects and to evaluate the neurological state of the patients. Therefore, the main contribution of this study is the automatic classification of PD patients and HC subjects in different languages with focusing on phonation, articulation, and prosody. We will focus on an intelligibility analysis based on automatic speech recognition systems trained on these three languages. This is one of the first studies done that considers the evaluation of the speech of PD patients in different languages. The purpose of this research proposal is to build a model that can discriminate PD and HC subjects even when the language used for train and test is different.

pdf bib
Annotating Shallow Discourse Relations in Twitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

pdf bib
RST-Tace A tool for automatic comparison and evaluation of RST trees
Shujun Wan | Tino Kutschbach | Anke Lüdeling | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

This paper presents RST-Tace, a tool for automatic comparison and evaluation of RST trees. RST-Tace serves as an implementation of Iruskieta’s comparison method, which allows trees to be compared and evaluated without the influence of decisions at lower levels in a tree in terms of four factors: constituent, attachment point, nuclearity as well as relation. RST-Tace can be used regardless of the language or the size of rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement. Both the comparison table and the result of the evaluation can be obtained automatically.

pdf bib
Coherence models in schizophrenia
Sandra Just | Erik Haegert | Nora Kořánová | Anna-Lena Bröcker | Ivan Nenchev | Jakob Funcke | Christiane Montag | Manfred Stede
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

Incoherent discourse in schizophrenia has long been recognized as a dominant symptom of the mental disorder (Bleuler, 1911/1950). Recent studies have used modern sentence and word embeddings to compute coherence metrics for spontaneous speech in schizophrenia. While clinical ratings always have a subjective element, computational linguistic methodology allows quantification of speech abnormalities. Clinical and empirical knowledge from psychiatry provide the theoretical and conceptual basis for modelling. Our study is an interdisciplinary attempt at improving coherence models in schizophrenia. Speech samples were obtained from healthy controls and patients with a diagnosis of schizophrenia or schizoaffective disorder and different severity of positive formal thought disorder. Interviews were transcribed and coherence metrics derived from different embeddings. One model found higher coherence metrics for controls than patients. All other models remained non-significant. More detailed analysis of the data motivates different approaches to improving coherence models in schizophrenia, e.g. by assessing referential abnormalities.

pdf bib
The Utility of Discourse Parsing Features for Predicting Argumentation Structure
Freya Hewett | Roshan Prakash Rane | Nina Harlacher | Manfred Stede
Proceedings of the 6th Workshop on Argument Mining

Research on argumentation mining from text has frequently discussed relationships to discourse parsing, but few empirical results are available so far. One corpus that has been annotated in parallel for argumentation structure and for discourse structure (RST, SDRT) are the ‘argumentative microtexts’ (Peldszus and Stede, 2016a). While results on perusing the gold RST annotations for predicting argumentation have been published (Peldszus and Stede, 2016b), the step to automatic discourse parsing has not yet been taken. In this paper, we run various discourse parsers (RST, PDTB) on the corpus, compare their results to the gold annotations (for RST) and then assess the contribution of automatically-derived discourse features for argumentation parsing. After reproducing the state-of-the-art Evidence Graph model from Afantenos et al. (2018) for the microtexts, we find that PDTB features can indeed improve its performance.

pdf bib
Computational Argumentation Synthesis as a Language Modeling Task
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Manfred Stede | Benno Stein
Proceedings of the 12th International Conference on Natural Language Generation

Synthesis approaches in computational argumentation so far are restricted to generating claim-like argument units or short summaries of debates. Ultimately, however, we expect computers to generate whole new arguments for a given stance towards some topic, backing up claims following argumentative and rhetorical considerations. In this paper, we approach such an argumentation synthesis as a language modeling task. In our language model, argumentative discourse units are the “words”, and arguments represent the “sentences”. Given a pool of units for any unseen topic-stance pair, the model selects a set of unit types according to a basic rhetorical strategy (logos vs. pathos), arranges the structure of the types based on the units’ argumentative roles, and finally “phrases” an argument by instantiating the structure with semantically coherent units from the pool. Our evaluation suggests that the model can, to some extent, mimic the human synthesis of strategy-specific arguments.

2018

pdf bib
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
Elena Musi | Manfred Stede | Leonard Kriese | Smaranda Muresan | Andrea Rocci
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Developing the Bangla RST Discourse Treebank
Debopam Das | Manfred Stede
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Lexicon of Discourse Markers for Portuguese – LDM-PT
Amália Mendes | Iria del Rio | Manfred Stede | Felix Dombek
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Argumentation Synthesis following Rhetorical Strategies
Henning Wachsmuth | Manfred Stede | Roxanne El Baff | Khalid Al-Khatib | Maria Skeppstedt | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics

Persuasion is rarely achieved through a loose set of arguments alone. Rather, an effective delivery of arguments follows a rhetorical strategy, combining logical reasoning with appeals to ethics and emotion. We argue that such a strategy means to select, arrange, and phrase a set of argumentative discourse units. In this paper, we model rhetorical strategies for the computational synthesis of effective argumentation. In a study, we let 26 experts synthesize argumentative texts with different strategies for 10 topics. We find that the experts agree in the selection significantly more when following the same strategy. While the texts notably vary for different strategies, especially their arrangement remains stable. The results suggest that our model enables a strategical synthesis.

pdf bib
Anaphora Resolution for Twitter Conversations: An Exploratory Study
Berfin Aktaş | Tatjana Scheffler | Manfred Stede
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

We present a corpus study of pronominal anaphora on Twitter conversations. After outlining the specific features of this genre, with respect to reference resolution, we explain the construction of our corpus and the annotation steps. From this we derive a list of phenomena that need to be considered when performing anaphora resolution on this type of data. Finally, we test the performance of an off-the-shelf resolution system, and provide some qualitative error analysis.

pdf bib
Identifying Explicit Discourse Connectives in German
Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We are working on an end-to-end Shallow Discourse Parsing system for German and in this paper focus on the first subtask: the identification of explicit connectives. Starting with the feature set from an English system and a Random Forest classifier, we evaluate our approach on a (relatively small) German annotated corpus, the Potsdam Commentary Corpus. We introduce new features and experiment with including additional training data obtained through annotation projection and achieve an f-score of 83.89.

pdf bib
Constructing a Lexicon of English Discourse Connectives
Debopam Das | Tatjana Scheffler | Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We present a new lexicon of English discourse connectives called DiMLex-Eng, built by merging information from two annotated corpora and an additional list of relation signals from the literature. The format follows the German connective lexicon DiMLex, which provides a cross-linguistically applicable XML schema. DiMLex-Eng contains 149 English connectives, and gives information on syntactic categories, discourse semantics and non-connective uses (if any). We report on the development steps and discuss design decisions encountered in the lexicon expansion phase. The resource is freely available for use in studies of discourse structure and computational applications.

pdf bib
More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing
Maria Skeppstedt | Andreas Peldszus | Manfred Stede
Proceedings of the 5th Workshop on Argument Mining

We present an extension of an annotated corpus of short argumentative texts that had originally been built in a controlled text production experiment. Our extension more than doubles the size of the corpus by means of crowdsourcing. We report on the setup of this experiment and on the consequences that crowdsourcing had for assembling the data, and in particular for annotation. We labeled the argumentative structure by marking claims, premises, and relations between them, following the scheme used in the original corpus, but had to make a few modifications in response to interesting phenomena in the data. Finally, we report on an experiment with the automatic prediction of this argumentation structure: We first replicated the approach of an earlier study on the original corpus, and compare the performance to various settings involving the extension.

pdf bib
Stance-Taking in Topics Extracted from Vaccine-Related Tweets and Discussion Forum Posts
Maria Skeppstedt | Manfred Stede | Andreas Kerren
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

The occurrence of stance-taking towards vaccination was measured in documents extracted by topic modelling from two different corpora, one discussion forum corpus and one tweet corpus. For some of the topics extracted, their most closely associated documents contained a proportion of vaccine stance-taking texts that exceeded the corpus average by a large margin. These extracted document sets would, therefore, form a useful resource in a process for computer-assisted analysis of argumentation on the subject of vaccination.

2017

pdf bib
Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities
Yulia Grishina | Manfred Stede
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

In this paper, we examine the possibility of using annotation projection from multiple sources for automatically obtaining coreference annotations in the target language. We implement a multi-source annotation projection algorithm and apply it on an English-German-Russian parallel corpus in order to transfer coreference chains from two sources to the target side. Operating in two settings – a low-resource and a more linguistically-informed one – we show that automatic coreference transfer could benefit from combining information from multiple languages, and assess the quality of both the extraction and the linking of target coreference mentions.

pdf bib
The Good, the Bad, and the Disagreement: Complex ground truth in rhetorical structure analysis
Debopam Das | Manfred Stede | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

pdf bib
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Kristiina Jokinen | Manfred Stede | David DeVault | Annie Louis
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Automatic detection of stance towards vaccination in online discussion forums
Maria Skeppstedt | Andreas Kerren | Manfred Stede
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance ‘against’ or ‘for’ vaccination, or as ‘undecided’. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance ‘against’ vaccination from stance ‘for’ vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Future work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features.

pdf bib
Extracting word lists for domain-specific implicit opinions from corpora
Núria Bertomeu Castelló | Manfred Stede
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

2016

pdf bib
Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
Tatjana Scheffler | Manfred Stede
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

DiMLex is a lexicon of German connectives that can be used for various language understanding purposes. We enhanced the coverage to 275 connectives, which we regard as covering all known German discourse connectives in current use. In this paper, we consider the task of adding the semantic relations that can be expressed by each connective. After discussing different approaches to retrieving semantic information, we settle on annotating each connective with senses from the new PDTB 3.0 sense hierarchy. We describe our new implementation in the extended DiMLex, which will be available for research purposes.

pdf bib
Parallel Discourse Annotations on a Corpus of Short Texts
Manfred Stede | Stergos Afantenos | Andreas Peldszus | Nicholas Asher | Jérémy Perret
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for their argumentation structure, according to the scheme of Peldszus and Stede (2013). This corpus therefore enables studies of correlations between the two accounts of discourse structure, and between discourse and argumentation. We converted the three annotation formats to a common dependency tree format that enables to compare the structures, and we describe some initial findings.

pdf bib
Information structure in the Potsdam Commentary Corpus: Topics
Manfred Stede | Sara Mamprin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Potsdam Commentary Corpus is a collection of 175 German newspaper commentaries annotated on a variety of different layers. This paper introduces a new layer that covers the linguistic notion of information-structural topic (not to be confused with ‘topic’ as applied to documents in information retrieval). To our knowledge, this is the first larger topic-annotated resource for German (and one of the first for any language). We describe the annotation guidelines and the annotation process, and the results of an inter-annotator agreement study, which compare favourably to the related work. The annotated corpus is freely available for research.

pdf bib
OPT: Oslo–Potsdam–Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing
Stephan Oepen | Jonathon Read | Tatjana Scheffler | Uladzimir Sidarenka | Manfred Stede | Erik Velldal | Lilja Øvrelid
Proceedings of the CoNLL-16 shared task

pdf bib
Towards assessing depth of argumentation
Manfred Stede
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

For analyzing argumentative text, we propose to study the ‘depth’ of argumentation as one important component, which we distinguish from argument quality. In a pilot study with German newspaper commentary texts, we asked students to rate the degree of argumentativeness, and then looked for correlations with features of the annotated argumentation structure and the rhetorical structure (in terms of RST). The results indicate that the human judgements correlate with our operationalization of depth and with certain structural features of RST trees.

pdf bib
Anaphoricity in Connectives: A Case Study on German
Manfred Stede | Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf bib
Rhetorical structure and argumentation structure in monologue text
Andreas Peldszus | Manfred Stede
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Generating Sentiment Lexicons for German Twitter
Uladzimir Sidarenka | Manfred Stede
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Despite a substantial progress made in developing new sentiment lexicon generation (SLG) methods for English, the task of transferring these approaches to other languages and domains in a sound way still remains open. In this paper, we contribute to the solution of this problem by systematically comparing semi-automatic translations of common English polarity lists with the results of the original automatic SLG algorithms, which were applied directly to German data. We evaluate these lexicons on a corpus of 7,992 manually annotated tweets. In addition to that, we also collate the results of dictionary- and corpus-based SLG methods in order to find out which of these paradigms is better suited for the inherently noisy domain of social media. Our experiments show that semi-automatic translations notably outperform automatic systems (reaching a macro-averaged F1-score of 0.589), and that dictionary-based techniques produce much better polarity lists as compared to corpus-based approaches (whose best F1-scores run up to 0.479 and 0.419 respectively) even for the non-standard Twitter genre.

2015

pdf bib
Joint prediction in MST-style discourse parsing for argumentation mining
Andreas Peldszus | Manfred Stede
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards Detecting Counter-considerations in Text
Andreas Peldszus | Manfred Stede
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
Knowledge-lean projection of coreference chains across languages
Yulia Grishina | Manfred Stede
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

2014

pdf bib
Conceptual and Practical Steps in Event Coreference Analysis of Large-scale Data
Fatemeh Torabi Asr | Jonathan Sonntag | Yulia Grishina | Manfred Stede
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
Lori Levin | Manfred Stede
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib
Potsdam Commentary Corpus 2.0: Annotation for Discourse Research
Manfred Stede | Arne Neumann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a revised and extended version of the Potsdam Commentary Corpus, a collection of 175 German newspaper commentaries (op-ed pieces) that has been annotated with syntax trees and three layers of discourse-level information: nominal coreference,connectives and their arguments (similar to the PDTB, Prasad et al. 2008), and trees reflecting discourse structure according to Rhetorical Structure Theory (Mann/Thompson 1988). Connectives have been annotated with the help of a semi-automatic tool, Conano (Stede/Heintze 2004), which identifies most connectives and suggests arguments based on their syntactic category. The other layers have been created manually with dedicated annotation tools. The corpus is made available on the one hand as a set of original XML files produced with the annotation tools, based on identical tokenization. On the other hand, it is distributed together with the open-source linguistic database ANNIS3 (Chiarcos et al. 2008; Zeldes et al. 2009), which provides multi-layer search functionality and layer-specific visualization modules. This allows for comfortable qualitative evaluation of the correlations between annotation layers.

pdf bib
A Model for Processing Illocutionary Structures and Argumentation in Debates
Kasia Budzynska | Mathilde Janier | Chris Reed | Patrick Saint-Dizier | Manfred Stede | Olena Yakorska
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we briefly present the objectives of Inference Anchoring Theory (IAT) and the formal structure which is proposed for dialogues. Then, we introduce our development corpus, and a computational model designed for the identification of discourse minimal units in the context of argumentation and the illocutionary force associated with each unit. We show the categories of resources which are needed and how they can be reused in different contexts.

pdf bib
GraPAT: a Tool for Graph Annotations
Jonathan Sonntag | Manfred Stede
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce GraPAT, a web-based annotation tool for building graph structures over text. Graphs have been demonstrated to be relevant in a variety of quite diverse annotation efforts and in different NLP applications, and they serve to model annotators’ intuitions quite closely. In particular, in this paper we discuss the implementation of graph annotations for sentiment analysis, argumentation structure, and rhetorical text structures. All of these scenarios can create certain problems for existing annotation tools, and we show how GraPAT can help to overcome such difficulties.

2013

pdf bib
From newspaper to microblogging: What does it take to find opinions?
Wladimir Sidorenko | Jonathan Sonntag | Nina Krüger | Stefan Stieglitz | Manfred Stede
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF
Arne Neumann | Nancy Ide | Manfred Stede
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Ranking the annotators: An agreement study on argumentation structure
Andreas Peldszus | Manfred Stede
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities
Andre Blessing | Jonathan Sonntag | Fritz Kliche | Ulrich Heid | Jonas Kuhn | Manfred Stede
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Discourse Processing
Manfred Stede
NAACL HLT 2013 Tutorial Abstracts

2012

pdf bib
SemScribe: Natural Language Generation for Medical Reports
Sebastian Varges | Heike Bieler | Manfred Stede | Lukas C. Faulstich | Kristin Irsig | Malik Atalla
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Natural language generation in the medical domain is heavily influenced by domain knowledge and genre-specific text characteristics. We present SemScribe, an implemented natural language generation system that produces doctor's letters, in particular descriptions of cardiological findings. Texts in this domain are characterized by a high density of information and a relatively telegraphic style. Domain knowledge is encoded in a medical ontology of about 80,000 concepts. The ontology is used in particular for concept generalizations during referring expression generation. Architecturally, the system is a generation pipeline that uses a corpus-informed syntactic frame approach for realizing sentences appropriate to the domain. The system reads XML documents conforming to the HL7 Clinical Document Architecture (CDA) Standard and enhances them with generated text and references to the used data elements. We conducted a first clinical trial evaluation with medical staff and report on the findings.

2011

pdf bib
Lexicon-Based Methods for Sentiment Analysis
Maite Taboada | Julian Brooke | Milan Tofiloski | Kimberly Voll | Manfred Stede
Computational Linguistics, Volume 37, Issue 2 - June 2011

2009

pdf bib
Proceedings of the Third Linguistic Annotation Workshop (LAW III)
Manfred Stede | Chu-Ren Huang | Nancy Ide | Adam Meyers
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
By all these lovely tokens... Merging Conflicting Tokenizations
Christian Chiarcos | Julia Ritz | Manfred Stede
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Genre-Based Paragraph Classification for Sentiment Analysis
Maite Taboada | Julian Brooke | Manfred Stede
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
Connective-based Local Coherence Analysis: A Lexicon for Recognizing Causal Relationships
Manfred Stede
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

pdf bib
Proceedings of the Linguistic Annotation Workshop
Branimir Boguraev | Nancy Ide | Adam Meyers | Shigeko Nariyama | Manfred Stede | Janyce Wiebe | Graham Wilcock
Proceedings of the Linguistic Annotation Workshop

pdf bib
Discourse Annotation Working Group Report
Manfred Stede | Janyce Wiebe | Eva Hajičová | Brian Reese | Simone Teufel | Bonnie Webber | Theresa Wilson
Proceedings of the Linguistic Annotation Workshop

pdf bib
Identifying Formal and Functional Zones in Film Reviews
Heike Bieler | Stefanie Dipper | Manfred Stede
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2004

pdf bib
Machine-Assisted Rhetorical Structure Annotation
Manfred Stede | Silvan Heintze
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
The Potsdam Commentary Corpus
Manfred Stede
Proceedings of the Workshop on Discourse Annotation

pdf bib
Feeding OWL: Extracting and Representing the Content of Pathology Reports
David Schlangen | Manfred Stede | Elena Paslaru Bontas
Proceeedings of the Workshop on NLP and XML (NLPXML-2004): RDF/RDFS and OWL in Language Technology

2003

pdf bib
Surfaces and depths in text understanding: The case of newspaper commentary
Manfred Stede
Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning

pdf bib
Step by step: underspecified markup in incremental rhetorical analysis
David Reitter | Manfred Stede
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

pdf bib
Rhetorical Parsing with Underspecification and Forests
Thomas Hanneforth | Silvan Heintze | Manfred Stede
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
XML/XSL in the Dictionary: The Case of Discourse Markers
Daniela Berger | David Reitter | Manfred Stede
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

pdf bib
Polibox: Generating Descriptions, Comparisons, and Recommendations from a Database
Manfred Stede
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

2000

pdf bib
Book Reviews: Predicative Forms in Natural Language and in Lexical Knowledge Bases
Manfred Stede
Computational Linguistics, Volume 26, Number 2, June 2000

pdf bib
The hyperonym problem revisited: Conceptual and lexical hierarchies in language generation
Manfred Stede
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

1998

pdf bib
DiMLex: A lexicon of discourse markers for text generation and understanding
Manfred Stede | Carla Umbach
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
DiMLex: A Lexicon of Discourse Markers for Text Generation and Understanding
Manfred Stede | Carla Umbach
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Discourse Marker Choice in Sentence Planning
Brigitte Grote | Manfred Stede
Natural Language Generation

pdf bib
A Generative Perspective on Verb Alternations
Manfred Stede
Computational-Linguistics, Volume 24, Number 3, September 1998

1997

pdf bib
Discourse particles and routine formulas in spoken language translation
Manfred Stede | Birte Schmitz
Spoken Language Translation

1996

pdf bib
A generative perspective on verbs and their readings
Manfred Stede
Eighth International Natural Language Generation Workshop

1994

pdf bib
Generating Multilingual Documents from a Knowledge Base The TECHDOC Project
Dietmar Rosner | Manfred Stede
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
TECHDOC: Multilingual generation of online and offline instructional text
Dietmar Rosner | Manfred Stede
Fourth Conference on Applied Natural Language Processing

1993

pdf bib
Lexical Choice Criteria in Language Generation
Manfred Stede
Sixth Conference of the European Chapter of the Association for Computational Linguistics

Search
Co-authors