Marti A. Hearst

Also published as: Marti Hearst


2020

pdf bib
More Diverse Dialogue Datasets via Diversity-Informed Data Collection
Katherine Stasaski | Grace Hui Yang | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.

pdf bib
The Summary Loop: Learning to Write Abstractive Summaries Without Examples
Philippe Laban | Andrew Hsi | John Canny | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint. It introduces a novel method that encourages the inclusion of key terms from the original document into the summary: key terms are masked out of the original document and must be filled in by a coverage model using the current generated summary. A novel unsupervised training procedure leverages this coverage model along with a fluency model to generate and score summaries. When tested on popular news summarization datasets, the method outperforms previous unsupervised methods by more than 2 R-1 points, and approaches results of competitive supervised methods. Our model attains higher levels of abstraction with copied passages roughly two times shorter than prior work, and learns to compress and merge sentences without supervision.

pdf bib
What’s The Latest? A Question-driven News Chatbot
Philippe Laban | John Canny | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This work describes an automatic news chatbot that draws content from a diverse set of news articles and creates conversations with a user about the news. Key components of the system include the automatic organization of news articles into topical chatrooms, integration of automatically generated questions into the conversation, and a novel method for choosing which questions to present which avoids repetitive suggestions. We describe the algorithmic framework and present the results of a usability study that shows that news readers using the system successfully engage in multi-turn conversations about specific news stories.

pdf bib
CIMA: A Large Open Access Dialogue Dataset for Tutoring
Katherine Stasaski | Kimberly Kao | Marti A. Hearst
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

One-to-one tutoring is often an effective means to help students learn, and recent experiments with neural conversation systems are promising. However, large open datasets of tutoring conversations are lacking. To remedy this, we propose a novel asynchronous method for collecting tutoring dialogue via crowdworkers that is both amenable to the needs of deep learning algorithms and reflective of pedagogical concerns. In this approach, extended conversations are obtained between crowdworkers role-playing as both students and tutors. The CIMA collection, which we make publicly available, is novel in that students are exposed to overlapping grounded concepts between exercises and multiple relevant tutoring responses are collected for the same input. CIMA contains several compelling properties from an educational perspective: student role-players complete exercises in fewer turns during the course of the conversation and tutor players adopt strategies that conform with some educational conversational norms, such as providing hints versus asking questions in appropriate contexts. The dataset enables a model to be trained to generate the next tutoring utterance in a conversation, conditioned on a provided action strategy.

pdf bib
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
Dongyeop Kang | Andrew Head | Risham Sidhu | Kyle Lo | Daniel Weld | Marti A. Hearst
Proceedings of the First Workshop on Scholarly Document Processing

The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in realworld applications. In this paper, we first perform in-depth error analysis of the current best performing definition detection system and discover major causes of errors. Based on this analysis, we develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evaluate it on a standard sentence-level benchmark. Because current benchmarks evaluate randomly sampled sentences, we propose an alternative evaluation that assesses every sentence within a document. This allows for evaluating recall in addition to precision. HEDDEx outperforms the leading system on both the sentence-level and the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively. We note that performance on the high-recall document-level task is much lower than in the standard evaluation approach, due to the necessity of incorporation of document structure as features. We discuss remaining challenges in document-level definition detection, ideas for improvements, and potential issues for the development of reading aid applications.

pdf bib
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search
Tom Hope | Jason Portenoy | Kishore Vasan | Jonathan Borchardt | Eric Horvitz | Daniel Weld | Marti Hearst | Jevin West
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over 15K users with over 42K page views and 13% returns.

2019

pdf bib
Towards augmenting crisis counselor training by improving message retrieval
Orianna Demasi | Marti A. Hearst | Benjamin Recht
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment.

2017

pdf bib
newsLens: building and visualizing long-ranging news stories
Philippe Laban | Marti Hearst
Proceedings of the Events and Stories in the News Workshop

We propose a method to aggregate and organize a large, multi-source dataset of news articles into a collection of major stories, and automatically name and visualize these stories in a working system. The approach is able to run online, as new articles are added, processing 4 million news articles from 20 news sources, and extracting 80000 major stories, some of which span several years. The visual interface consists of lanes of timelines, each annotated with information that is deemed important for the story, including extracted quotations. The working system allows a user to search and navigate 8 years of story information.

pdf bib
Multiple Choice Question Generation Utilizing An Ontology
Katherine Stasaski | Marti A. Hearst
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Ontologies provide a structured representation of concepts and the relationships which connect them. This work investigates how a pre-existing educational Biology ontology can be used to generate useful practice questions for students by using the connectivity structure in a novel way. It also introduces a novel way to generate multiple-choice distractors from the ontology, and compares this to a baseline of using embedding representations of nodes. An assessment by an experienced science teacher shows a significant advantage over a baseline when using the ontology for distractor generation. A subsequent study with three science teachers on the results of a modified question generation algorithm finds significant improvements. An in-depth analysis of the teachers’ comments yields useful insights for any researcher working on automated question generation for educational applications.

2016

pdf bib
Patterns of Wisdom: Discourse-Level Style in Multi-Sentence Quotations
Kyle Booten | Marti A. Hearst
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Intersecting Word Vectors to Take Figurative Language to New Heights
Andrea Gagliano | Emily Paul | Kyle Booten | Marti A. Hearst
Proceedings of the Fifth Workshop on Computational Linguistics for Literature

pdf bib
Augmenting Course Material with Open Access Textbooks
Smitha Milli | Marti A. Hearst
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

2015

pdf bib
Can Natural Language Processing Become Natural Language Coaching?
Marti A. Hearst
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Improving the Recognizability of Syntactic Relations Using Contextualized Examples
Aditi Muralidharan | Marti A. Hearst
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces
Jason Chuang | Spence Green | Marti Hearst | Jeffrey Heer | Philipp Koehn
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces

2009

pdf bib
NLP Support for Faceted Navigation in Scholarly Collection
Marti A. Hearst | Emilia Stoica
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

2008

pdf bib
Solving Relational Similarity Problems Using the Web as a Corpus
Preslav Nakov | Marti A. Hearst
Proceedings of ACL-08: HLT

pdf bib
Improving Search Results Quality by Customizing Summary Lengths
Michael Kaisser | Marti A. Hearst | John B. Lowe
Proceedings of ACL-08: HLT

2007

pdf bib
Automating Creation of Hierarchical Faceted Metadata Structures
Emilia Stoica | Marti Hearst | Megan Richardson
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts
Marti Hearst | Gina-Anne Levow | James Allan
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

pdf bib
UCB System Description for the WMT 2007 Shared Task
Preslav Nakov | Marti Hearst
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces
Marti Hearst | Anna Divoli | Ye Jerry | Michael Wooldridge
Biological, translational, and clinical language processing

pdf bib
UCB: System Description for SemEval Task #4
Preslav Nakov | Marti Hearst
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding
Ariel Schwartz | Anna Divoli | Marti Hearst
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Summarizing Key Concepts using Citation Sentences
Ariel S. Schwartz | Marti Hearst
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

2005

pdf bib
Multi-way Relation Classification: Application to Protein-Protein Interactions
Barbara Rosario | Marti Hearst
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution
Preslav Nakov | Marti Hearst
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Supporting Annotation Layers for Natural Language Processing
Preslav Nakov | Ariel Schwartz | Brian Wolf | Marti Hearst
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Teaching Applied Natural Language Processing: Triumphs and Tribulations
Marti Hearst
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing
Preslav Nakov | Marti Hearst
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
Classifying Semantic Relations in Bioscience Texts
Barbara Rosario | Marti Hearst
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Nearly-Automated Metadata Hierarchy Creation
Emilia Stoica | Marti A. Hearst
Proceedings of HLT-NAACL 2004: Short Papers

2003

pdf bib
Category-based Pseudowords
Preslav I. Nakov | Marti A. Hearst
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
A Critique and Improvement of an Evaluation Metric for Text Segmentation
Lev Pevzner | Marti A. Hearst
Computational Linguistics, Volume 28, Number 1, March 2002

pdf bib
The Descent of Hierarchy, and Selection in Relational Semantics
Barbara Rosario | Marti Hearst | Charles Fillmore
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy
Barbara Rosario | Marti Hearst
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

1999

pdf bib
Untangling Text Data Mining
Marti A. Hearst
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib
Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages
Marti A. Hearst
Computational Linguistics, Volume 23, Number 1, March 1997

pdf bib
Adaptive Multilingual Sentence Boundary Disambiguation
David D. Palmer | Marti A. Hearst
Computational Linguistics, Volume 23, Number 2, June 1997

1994

pdf bib
Multi-Paragraph Segmentation Expository Text
Marti A. Hearst
32nd Annual Meeting of the Association for Computational Linguistics

pdf bib
Adaptive Sentence Boundary Disambiguation
David D. Palmer | Marti A. Hearst
Fourth Conference on Applied Natural Language Processing

1993

pdf bib
Customizing a Lexicon to Better Suit a Computational Task
Marti Hearst | Hinrich Schuetze
Acquisition of Lexical Knowledge from Text

pdf bib
Structural Ambiguity and Conceptual Relations
Philip Resnik | Marti A. Hearst
Very Large Corpora: Academic and Industrial Perspectives

1992

pdf bib
Automatic Acquisition of Hyponyms from Large Text Corpora
Marti A. Hearst
COLING 1992 Volume 2: The 15th International Conference on Computational Linguistics