Proceedings of the First Workshop on Computational Approaches to Discourse
With their huge speaking populations in the world, Spanish and Chinese occupy important positions in linguistic studies. Since the two languages come from different language systems, the translation between Spanish and Chinese is complicated. A comparative study for the language pair can discover the discourse differences between Spanish and Chinese, and can benefit the Spanish-Chinese translation. In this work, based on a Spanish-Chinese parallel corpus annotated with discourse information, we compare the annotation results between the language pair and analyze how discourse affects Spanish-Chinese translation. The research results in our study can help human translators who work with the language pair.
This work proposes a framework to predict sequences in dialogues, using turn based syntactic features and dialogue control functions. Syntactic features were extracted using dependency parsing, while dialogue control functions were manually labelled. These features were transformed using tf-idf and word embedding; feature selection was done using Principal Component Analysis (PCA). We ran experiments on six combinations of features to predict sequences with Hierarchical Agglomerative Clustering. An analysis of the clustering results indicate that using word embeddings and syntactic features, significantly improved the results.
Coreference resolution (CR) is an essential part of discourse analysis. Most recently, neural approaches have been proposed to improve over SOTA models from earlier paradigms. So far none of the published neural models leverage external semantic knowledge such as type information. This paper offers the first such model and evaluation, demonstrating modest gains in accuracy by introducing either gold standard or predicted types. In the proposed approach, type information serves both to (1) improve mention representation and (2) create a soft type consistency check between coreference candidate mentions. Our evaluation covers two different grain sizes of types over four different benchmark corpora.
In coreference resolution, span representations play a key role to predict coreference links accurately. We present a thorough examination of the span representation derived by applying BERT on coreference resolution (Joshi et al., 2019) using a probing model. Our results show that the span representation is able to encode a significant amount of coreference information. In addition, we find that the head-finding attention mechanism involved in creating the spans is crucial in encoding coreference knowledge. Last, our analysis shows that the span representation cannot capture non-local coreference as efficiently as local coreference.
Sketch comedy and crosstalk are two popular types of comedy. They can relieve people’s stress and thus benefit their mental health, especially when performances and scripts are high-quality. However, writing a script is time-consuming and its quality is difficult to achieve. In order to minimise the time and effort needed for producing an excellent script, we explore ways of predicting the audience’s response from the comedy scripts. For this task, we present a corpus of annotated scripts from popular television entertainment programmes in recent years. Annotations include a) text classification labels, indicating which actor’s lines made the studio audience laugh; b) information extraction labels, i.e. the text spans that made the audience laughed immediately after the performers said them. The corpus will also be useful for dialogue systems and discourse analysis, since our annotations are based on entire scripts. In addition, we evaluate different baseline algorithms. Experimental results demonstrate that BERT models can achieve the best predictions among all the baseline methods. Furthermore, we conduct an error analysis and investigate predictions across scripts with different styles.
The present paper focuses on variation phenomena in coreference chains. We address the hypothesis that the degree of structural variation between chain elements depends on language-specific constraints and preferences and, even more, on the communicative situation of language production. We define coreference features that also include reference to abstract entities and events. These features are inspired through several sources – cognitive parameters, pragmatic factors and typological status. We pay attention to the distributions of these features in a dataset containing English and German texts of spoken and written discourse mode, which can be classified into seven different registers. We apply text classification and feature selection to find out how these variational dimensions (language, mode and register) impact on coreference features. Knowledge on the variation under analysis is valuable for contrastive linguistics, translation studies and multilingual natural language processing (NLP), e.g. machine translation or cross-lingual coreference resolution.
This paper studies a novel model that simplifies the disambiguation of connectives for explicit discourse relations. We use a neural approach that integrates contextualized word embeddings and predicts whether a connective candidate is part of a discourse relation or not. We study the influence of those context-specific embeddings. Further, we show the benefit of training the tasks of connective disambiguation and sense classification together at the same time. The success of our approach is supported by state-of-the-art results.
In this paper, the utility and advantages of the discourse analysis for text pairs categorization and ranking are investigated. We consider two tasks in which discourse structure seems useful and important: automatic verification of political statements, and ranking in question answering systems. We propose a neural network based approach to learn the match between pairs of discourse tree structures. To this end, the neural TreeLSTM model is modified to effectively encode discourse trees and DSNDM model based on it is suggested to analyze pairs of texts. In addition, the integration of the attention mechanism in the model is proposed. Moreover, different ranking approaches are investigated for the second task. In the paper, the comparison with state-of-the-art methods is given. Experiments illustrate that combination of neural networks and discourse structure in DSNDM is effective since it reaches top results in the assigned tasks. The evaluation also demonstrates that discourse analysis improves quality for the processing of longer texts.
We introduce four tasks designed to determine which sentence encoders best capture discourse properties of sentences from scientific abstracts, namely coherence and cohesion between clauses of a sentence, and discourse relations within sentences. We show that even if contextual encoders such as BERT or SciBERT encodes the coherence in discourse units, they do not help to predict three discourse relations commonly used in scientific abstracts. We discuss what these results underline, namely that these discourse relations are based on particular phrasing that allow non-contextual encoders to perform well.
We recognize the task of event argument linking in documents as similar to that of intent slot resolution in dialogue, providing a Transformer-based model that extends from a recently proposed solution to resolve references to slots. The approach allows for joint consideration of argument candidates given a detected event, which we illustrate leads to state-of-the-art performance in multi-sentence argument linking.
In this work, we systematically investigate how well current models of coherence can capture aspects of text implicated in discourse organisation. We devise two datasets of various linguistic alterations that undermine coherence and test model sensitivity to changes in syntax and semantics. We furthermore probe discourse embedding space and examine the knowledge that is encoded in representations of coherence. We hope this study shall provide further insight into how to frame the task and improve models of coherence assessment further. Finally, we make our datasets publicly available as a resource for researchers to use to test discourse coherence models.
First, we discuss the most common linguistic perspectives on the concept of recency and propose a taxonomy of recency metrics employed in Machine Learning studies for choosing the form of referring expressions in discourse context. We then report on a Multi-Layer Perceptron study and a Sequential Forward Search experiment, followed by Bayes Factor analysis of the outcomes. The results suggest that recency metrics counting paragraphs and sentences contribute to referential choice prediction more than other recency-related metrics. Based on the results of our analysis, we argue that, sensitivity to discourse structure is important for recency metrics used in determining referring expression forms.
The multi-head self-attention of popular transformer models is widely used within Natural Language Processing (NLP), including for the task of extractive summarization. With the goal of analyzing and pruning the parameter-heavy self-attention mechanism, there are multiple approaches proposing more parameter-light self-attention alternatives. In this paper, we present a novel parameter-lean self-attention mechanism using discourse priors. Our new tree self-attention is based on document-level discourse information, extending the recently proposed “Synthesizer” framework with another lightweight alternative. We show empirical results that our tree self-attention approach achieves competitive ROUGE-scores on the task of extractive summarization. When compared to the original single-head transformer model, the tree attention approach reaches similar performance on both, EDU and sentence level, despite the significant reduction of parameters in the attention component. We further significantly outperform the 8-head transformer model on sentence level when applying a more balanced hyper-parameter setting, requiring an order of magnitude less parameters.
The PDTB-3 contains many more Implicit discourse relations than the previous PDTB-2. This is in part because implicit relations have now been annotated within sentences as well as between them. In addition, some now co-occur with explicit discourse relations, instead of standing on their own. Here we show that while this can complicate the problem of identifying the location of implicit discourse relations, it can in turn simplify the problem of identifying their senses. We present data to support this claim, as well as methods that can serve as a non-trivial baseline for future state-of-the-art recognizers for implicit discourse relations.
In this work, we present two new bilingual discourse connective lexicons, namely, for Turkish-English and European Portuguese-English created automatically using the existing discourse relation-aligned TED-MDB corpus. In their current form, the Pt-En lexicon includes 95 entries, whereas the Tr-En lexicon contains 133 entries. The lexicons constitute the first step of a larger project of developing a multilingual discourse connective lexicon.
A substantial overlap of coreferent mentions in the CoNLL dataset magnifies the recent progress on coreference resolution. This is because the CoNLL benchmark fails to evaluate the ability of coreference resolvers that requires linking novel mentions unseen at train time. In this work, we create a new dataset based on CoNLL, which largely decreases mention overlaps in the entire dataset and exposes the limitations of published resolvers on two aspects—lexical inference ability and understanding of low-level orthographic noise. Our findings show (1) the requirements for embeddings, used in resolvers, and for coreference resolutions are, by design, in conflict and (2) adversarial approaches are sometimes not legitimate to mitigate the obstacles, as they may falsely introduce mention overlaps in adversarial training and test sets, thus giving an inflated impression for the improvements.
We present preliminary results on investigating the benefits of coreference resolution features for neural RST discourse parsing by considering different levels of coupling of the discourse parser with the coreference resolver. In particular, starting with a strong baseline neural parser unaware of any coreference information, we compare a parser which utilizes only the output of a neural coreference resolver, with a more sophisticated model, where discourse parsing and coreference resolution are jointly learned in a neural multitask fashion. Results indicate that these initial attempts to incorporate coreference information do not boost the performance of discourse parsing in a statistically significant way.