Alon Halevy


2019

pdf bib
Open Information Extraction from Question-Answer Pairs
Nikita Bhutani | Yoshihiko Suhara | Wang-Chiew Tan | Alon Halevy | H. V. Jagadish
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Open Information Extraction (OpenIE) extracts meaningful structured tuples from free-form text. Most previous work on OpenIE considers extracting data from one sentence at a time. We describe NeurON, a system for extracting tuples from question-answer pairs. One of the main motivations for NeurON is to be able to extend knowledge bases in a way that considers precisely the information that users care about. NeurON addresses several challenges. First, an answer text is often hard to understand without knowing the question, and second, relevant information can span multiple sentences. To address these, NeurON formulates extraction as a multi-source sequence-to-sequence learning task, wherein it combines distributed representations of a question and an answer to generate knowledge facts. We describe experiments on two real-world datasets that demonstrate that NeurON can find a significant number of new and interesting facts to extend a knowledge base compared to state-of-the-art OpenIE methods.

2018

pdf bib
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Akari Asai | Sara Evensen | Behzad Golshan | Alon Halevy | Vivian Li | Andrei Lopatenko | Daniela Stepanov | Yoshihiko Suhara | Wang-Chiew Tan | Yinzhan Xu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
FrameIt: Ontology Discovery for Noisy User-Generated Text
Dan Iter | Alon Halevy | Wang-Chiew Tan
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

A common need of NLP applications is to extract structured data from text corpora in order to perform analytics or trigger an appropriate action. The ontology defining the structure is typically application dependent and in many cases it is not known a priori. We describe the FrameIt System that provides a workflow for (1) quickly discovering an ontology to model a text corpus and (2) learning an SRL model that extracts the instances of the ontology from sentences in the corpus. FrameIt exploits data that is obtained in the ontology discovery phase as weak supervision data to bootstrap the SRL model and then enables the user to refine the model with active learning. We present empirical results and qualitative analysis of the performance of FrameIt on three corpora of noisy user-generated text.

2014

pdf bib
ReNoun: Fact Extraction for Nominal Attributes
Mohamed Yahya | Steven Whang | Rahul Gupta | Alon Halevy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)