Chulaka Gunasekara


2020

pdf bib
Implicit Discourse Relation Classification: We Need to Talk about Evaluation
Najoung Kim | Song Feng | Chulaka Gunasekara | Luis Lastras
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Implicit relation classification on Penn Discourse TreeBank (PDTB) 2.0 is a common benchmark task for evaluating the understanding of discourse relations. However, the lack of consistency in preprocessing and evaluation poses challenges to fair comparison of results in the literature. In this work, we highlight these inconsistencies and propose an improved evaluation protocol. Paired with this protocol, we report strong baseline results from pretrained sentence encoders, which set the new state-of-the-art for PDTB 2.0. Furthermore, this work is the first to explore fine-grained relation classification on PDTB 3.0. We expect our work to serve as a point of comparison for future work, and also as an initiative to discuss models of larger context and possible data augmentations for downstream transferability.

pdf bib
Conversational Document Prediction to Assist Customer Care Agents
Jatin Ganhotra | Haggai Roitman | Doron Cohen | Nathaniel Mills | Chulaka Gunasekara | Yosi Mass | Sachindra Joshi | Luis Lastras | David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

A frequent pattern in customer care conversations is the agents responding with appropriate webpage URLs that address users’ needs. We study the task of predicting the documents that customer care agents can use to facilitate users’ needs. We also introduce a new public dataset which supports the aforementioned problem. Using this dataset and two others, we investigate state-of-the art deep learning (DL) and information retrieval (IR) models for the task. Additionally, we analyze the practicality of such systems in terms of inference time complexity. Our show that an hybrid IR+DL approach provides the best of both worlds.

pdf bib
doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset
Song Feng | Hui Wan | Chulaka Gunasekara | Siva Patel | Sachindra Joshi | Luis Lastras
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We introduce doc2dial, a new dataset of goal-oriented dialogues that are grounded in the associated documents. Inspired by how the authors compose documents for guiding end users, we first construct dialogue flows based on the content elements that corresponds to higher-level relations across text sections as well as lower-level relations between discourse units within a section. Then we present these dialogue flows to crowd contributors to create conversational utterances. The dataset includes over 4500 annotated conversations with an average of 14 turns that are grounded in over 450 documents from four domains. Compared to the prior document-grounded dialogue datasets, this dataset covers a variety of dialogue scenes in information-seeking conversations. For evaluating the versatility of the dataset, we introduce multiple dialogue modeling tasks and present baseline approaches.

pdf bib
Agent Assist through Conversation Analysis
Kshitij Fadnis | Nathaniel Mills | Jatin Ganhotra | Haggai Roitman | Gaurav Pandey | Doron Cohen | Yosi Mass | Shai Erera | Chulaka Gunasekara | Danish Contractor | Siva Patel | Q. Vera Liao | Sachindra Joshi | Luis Lastras | David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Customer support agents play a crucial role as an interface between an organization and its end-users. We propose CAIRAA: Conversational Approach to Information Retrieval for Agent Assistance, to reduce the cognitive workload of support agents who engage with users through conversation systems. CAIRAA monitors an evolving conversation and recommends both responses and URLs of documents the agent can use in replies to their client. We combine traditional information retrieval (IR) approaches with more recent Deep Learning (DL) models to ensure high accuracy and efficient run-time performance in the deployed system. Here, we describe the CAIRAA system and demonstrate its effectiveness in a pilot study via a short video.

2019

pdf bib
A Large-Scale Corpus for Conversation Disentanglement
Jonathan K. Kummerfeld | Sai R. Gouravajhala | Joseph J. Peper | Vignesh Athreya | Chulaka Gunasekara | Jatin Ganhotra | Siva Sankalp Patel | Lazaros C Polymenakos | Walter Lasecki
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets. We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Our data is 16 times larger than all previously released datasets combined, the first to include adjudication of annotation disagreements, and the first to include context. We use our data to re-examine prior work, in particular, finding that 89% of conversations in a widely used dialogue corpus are either missing messages or contain extra messages. Our manually-annotated data presents an opportunity to develop robust data-driven methods for conversation disentanglement, which will help advance dialogue research.

pdf bib
DSTC7 Task 1: Noetic End-to-End Response Selection
Chulaka Gunasekara | Jonathan K. Kummerfeld | Lazaros Polymenakos | Walter Lasecki
Proceedings of the First Workshop on NLP for Conversational AI

Goal-oriented dialogue in complex domains is an extremely challenging problem and there are relatively few datasets. This task provided two new resources that presented different challenges: one was focused but small, while the other was large but diverse. We also considered several new variations on the next utterance selection problem: (1) increasing the number of candidates, (2) including paraphrases, and (3) not including a correct option in the candidate set. Twenty teams participated, developing a range of neural network models, including some that successfully incorporated external data to boost performance. Both datasets have been publicly released, enabling future work to build on these results, working towards robust goal-oriented dialogue systems.