Bonan Min


2020

pdf bib
Towards Few-Shot Event Mention Retrieval: An Evaluation Framework and A Siamese Network Approach
Bonan Min | Yee Seng Chan | Lingjun Zhao
Proceedings of the 12th Language Resources and Evaluation Conference

Automatically analyzing events in a large amount of text is crucial for situation awareness and decision making. Previous approaches treat event extraction as “one size fits all” with an ontology defined a priori. The resulted extraction models are built just for extracting those types in the ontology. These approaches cannot be easily adapted to new event types nor new domains of interest. To accommodate personalized event-centric information needs, this paper introduces the few-shot Event Mention Retrieval (EMR) task: given a user-supplied query consisting of a handful of event mentions, return relevant event mentions found in a corpus. This formulation enables “query by example”, which drastically lowers the bar of specifying event-centric information needs. The retrieval setting also enables fuzzy search. We present an evaluation framework leveraging existing event datasets such as ACE. We also develop a Siamese Network approach, and show that it performs better than ad-hoc retrieval models in the few-shot EMR setting.

pdf bib
Concept Wikification for COVID-19
Panagiotis Lymperopoulos | Haoling Qiu | Bonan Min
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

Understanding scientific articles related to COVID-19 requires broad knowledge about concepts such as symptoms, diseases and medicine. Given the very large and ever-growing scientific articles related to COVID-19, it is a daunting task even for experts to recognize the large set of concepts mentioned in these articles. In this paper, we address the problem of concept wikification for COVID-19, which is to automatically recognize mentions of concepts related to COVID-19 in text and resolve them into Wikipedia titles. We develop an approach to curate a COVID-19 concept wikification dataset by mining Wikipedia text and the associated intra-Wikipedia links. We also develop an end-to-end system for concept wikification for COVID-19. Preliminary experiments show very encouraging results. Our dataset, code and pre-trained model are available at github.com/panlybero/Covid19_wikification.

pdf bib
Weakly Supervised Subevent Knowledge Acquisition
Wenlin Yao | Zeyu Dai | Maitreyi Ramaswamy | Bonan Min | Ruihong Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Subevents elaborate an event and widely exist in event descriptions. Subevent knowledge is useful for discourse analysis and event-centric applications. Acknowledging the scarcity of subevent knowledge, we propose a weakly supervised approach to extract subevent relation tuples from text and build the first large scale subevent knowledge base. We first obtain the initial set of event pairs that are likely to have the subevent relation, by exploiting two observations that 1) subevents are temporally contained by the parent event, and 2) the definitions of the parent event can be used to further guide the identification of subevents. Then, we collect rich weak supervision using the initial seed subevent pairs to train a contextual classifier using BERT and apply the classifier to identify new subevent pairs. The evaluation showed that the acquired subevent tuples (239K) are of high quality (90.1% accuracy) and cover a wide range of event types. The acquired subevent knowledge has been shown useful for discourse analysis and identifying a range of event-event relations.

pdf bib
Annotating Temporal Dependency Graphs via Crowdsourcing
Jiarui Yao | Haoling Qiu | Bonan Min | Nianwen Xue
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the construction of a corpus of 500 Wikinews articles annotated with temporal dependency graphs (TDGs) that can be used to train systems to understand temporal relations in text. We argue that temporal dependency graphs, built on previous research on narrative times and temporal anaphora, provide a representation scheme that achieves a good trade-off between completeness and practicality in temporal annotation. We also provide a crowdsourcing strategy to annotate TDGs, and demonstrate the feasibility of this approach with an evaluation of the quality of the annotation, and the utility of the resulting data set by training a machine learning model on this data set. The data set is publicly available.

pdf bib
Exploring Contextualized Neural Language Models for Temporal Dependency Parsing
Hayley Ross | Jonathon Cai | Bonan Min
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Extracting temporal relations between events and time expressions has many applications such as constructing event timelines and time-related question answering. It is a challenging problem which requires syntactic and semantic information at sentence or discourse levels, which may be captured by deep contextualized language models (LMs) such as BERT (Devlin et al., 2019). In this paper, we develop several variants of BERT-based temporal dependency parser, and show that BERT significantly improves temporal dependency parsing (Zhang and Xue, 2018a). We also present a detailed analysis on why deep contextualized neural LMs help and where they may fall short. Source code and resources are made available at https://github.com/bnmin/tdp_ranking.

2019

pdf bib
Measure Country-Level Socio-Economic Indicators with Streaming News: An Empirical Study
Bonan Min | Xiaoxi Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Socio-economic conditions are difficult to measure. For example, the U.S. Bureau of Labor Statistics needs to conduct large-scale household surveys regularly to track the unemployment rate, an indicator widely used by economists and policymakers. We argue that events reported in streaming news can be used as “micro-sensors” for measuring socio-economic conditions. Similar to collecting surveys and then counting answers, it is possible to measure a socio-economic indicator by counting related events. In this paper, we propose Event-Centric Indicator Measure (ECIM), a novel approach to measure socio-economic indicators with events. We empirically demonstrate strong correlation between ECIM values to several representative indicators in socio-economic research.

pdf bib
Towards Machine Reading for Interventions from Humanitarian-Assistance Program Literature
Bonan Min | Yee Seng Chan | Haoling Qiu | Joshua Fasching
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Solving long-lasting problems such as food insecurity requires a comprehensive understanding of interventions applied by governments and international humanitarian assistance organizations, and their results and consequences. Towards achieving this grand goal, a crucial first step is to extract past interventions and when and where they have been applied, from hundreds of thousands of reports automatically. In this paper, we developed a corpus annotated with interventions to foster research, and developed an information extraction system for extracting interventions and their location and time from text. We demonstrate early, very encouraging results on extracting interventions.

pdf bib
Rapid Customization for Event Extraction
Yee Seng Chan | Joshua Fasching | Haoling Qiu | Bonan Min
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Extracting events in the form of who is involved in what at when and where from text, is one of the core information extraction tasks that has many applications such as web search and question answering. We present a system for rapidly customizing event extraction capability to find new event types (what happened) and their arguments (who, when, and where). To enable extracting events of new types, we develop a novel approach to allow a user to find, expand and filter event triggers by exploring an unannotated development corpus. The system will then generate mention level event annotation automatically and train a neural network model for finding the corresponding events. To enable extracting arguments for new event types, the system makes novel use of the ACE annotation dataset to train a generic argument attachment model for extracting Actor, Place, and Time. We demonstrate that with less than 10 minutes of human effort per event type, the system achieves good performance for 67 novel event types. Experiments also show that the generic argument attachment model performs well on the novel event types. Our system (code, UI, documentation, demonstration video) is released as open source.

2018

pdf bib
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation
Bonan Min | Marjorie Freedman | Roger Bock | Ralph Weischedel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Case Study on Learning a Unified Encoder of Relations
Lisheng Fu | Bonan Min | Thien Huu Nguyen | Ralph Grishman
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

Typical relation extraction models are trained on a single corpus annotated with a pre-defined relation schema. An individual corpus is often small, and the models may often be biased or overfitted to the corpus. We hypothesize that we can learn a better representation by combining multiple relation datasets. We attempt to use a shared encoder to learn the unified feature representation and to augment it with regularization by adversarial training. The additional corpora feeding the encoder can help to learn a better feature representation layer even though the relation schemas are different. We use ACE05 and ERE datasets as our case study for experiments. The multi-task model obtains significant improvement on both datasets.

2017

pdf bib
Probabilistic Inference for Cold Start Knowledge Base Population with Prior World Knowledge
Bonan Min | Marjorie Freedman | Talya Meltzer
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Building knowledge bases (KB) automatically from text corpora is crucial for many applications such as question answering and web search. The problem is very challenging and has been divided into sub-problems such as mention and named entity recognition, entity linking and relation extraction. However, combining these components has shown to be under-constrained and often produces KBs with supersize entities and common-sense errors in relations (a person has multiple birthdates). The errors are difficult to resolve solely with IE tools but become obvious with world knowledge at the corpus level. By analyzing Freebase and a large text collection, we found that per-relation cardinality and the popularity of entities follow the power-law distribution favoring flat long tails with low-frequency instances. We present a probabilistic joint inference algorithm to incorporate this world knowledge during KB construction. Our approach yields state-of-the-art performance on the TAC Cold Start task, and 42% and 19.4% relative improvements in F1 over our baseline on Cold Start hop-1 and all-hop queries respectively.

pdf bib
Learning Transferable Representation for Bilingual Relation Extraction via Convolutional Neural Networks
Bonan Min | Zhuolin Jiang | Marjorie Freedman | Ralph Weischedel
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Typically, relation extraction models are trained to extract instances of a relation ontology using only training data from a single language. However, the concepts represented by the relation ontology (e.g. ResidesIn, EmployeeOf) are language independent. The numbers of annotated examples available for a given ontology vary between languages. For example, there are far fewer annotated examples in Spanish and Japanese than English and Chinese. Furthermore, using only language-specific training data results in the need to manually annotate equivalently large amounts of training for each new language a system encounters. We propose a deep neural network to learn transferable, discriminative bilingual representation. Experiments on the ACE 2005 multilingual training corpus demonstrate that the joint training process results in significant improvement in relation classification performance over the monolingual counterparts. The learnt representation is discriminative and transferable between languages. When using 10% (25K English words, or 30K Chinese characters) of the training data, our approach results in doubling F1 compared to a monolingual baseline. We achieve comparable performance to the monolingual system trained with 250K English words (or 300K Chinese characters) With 50% of training data.

pdf bib
Domain Adaptation for Relation Extraction with Domain Adversarial Neural Network
Lisheng Fu | Thien Huu Nguyen | Bonan Min | Ralph Grishman
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Relations are expressed in many domains such as newswire, weblogs and phone conversations. Trained on a source domain, a relation extractor’s performance degrades when applied to target domains other than the source. A common yet labor-intensive method for domain adaptation is to construct a target-domain-specific labeled dataset for adapting the extractor. In response, we present an unsupervised domain adaptation method which only requires labels from the source domain. Our method is a joint model consisting of a CNN-based relation classifier and a domain-adversarial classifier. The two components are optimized jointly to learn a domain-independent representation for prediction on the target domain. Our model outperforms the state-of-the-art on all three test domains of ACE 2005.

2014

pdf bib
Infusion of Labeled Data into Distant Supervision for Relation Extraction
Maria Pershina | Bonan Min | Wei Xu | Ralph Grishman
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Distant Supervision for Relation Extraction with an Incomplete Knowledge Base
Bonan Min | Ralph Grishman | Li Wan | Chang Wang | David Gondek
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Challenges in the Knowledge Base Population Slot Filling Task
Bonan Min | Ralph Grishman
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The Knowledge Based Population (KBP) evaluation track of the Text Analysis Conferences (TAC) has been held for the past 3 years. One of the two tasks of KBP is slot filling: finding within a large corpus the values of a set of attributes of given people and organizations. This task has proven very challenging, with top systems rarely exceeding 30% F-measure. In this paper, we present an error analysis and classification for those answers which could be found by a manual corpus search but were not found by any of the systems participating in the 2010 evaluation. The most common sources of failure were limitations on inference, errors in coreference (particularly with nominal anaphors), and errors in named entity recognition. We relate the types of errors to the characteristics of the task and show the wide diversity of problems that must be addressed to improve overall performance.

pdf bib
Compensating for Annotation Errors in Training a Relation Extractor
Bonan Min | Ralph Grishman
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Ensemble Semantics for Large-scale Unsupervised Relation Extraction
Bonan Min | Shuming Shi | Ralph Grishman | Chin-Yew Lin
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Fine-grained Entity Set Refinement with User Feedback
Bonan Min | Ralph Grishman
Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition