Oren Etzioni


2020

pdf bib
CORD-19: The COVID-19 Open Research Dataset
Lucy Lu Wang | Kyle Lo | Yoganand Chandrasekhar | Russell Reas | Jiangjiang Yang | Doug Burdick | Darrin Eide | Kathryn Funk | Yannis Katsis | Rodney Michael Kinney | Yunyao Li | Ziyang Liu | William Merrill | Paul Mooney | Dewey A. Murdick | Devvret Rishi | Jerry Sheehan | Zhihong Shen | Brandon Stilson | Alex D. Wade | Kuansan Wang | Nancy Xin Ru Wang | Christopher Wilhelm | Boya Xie | Douglas M. Raymond | Daniel S. Weld | Oren Etzioni | Sebastian Kohlmeier
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for COVID-19.

2018

pdf bib
Construction of the Literature Graph in Semantic Scholar
Waleed Ammar | Dirk Groeneveld | Chandra Bhagavatula | Iz Beltagy | Miles Crawford | Doug Downey | Jason Dunkelberger | Ahmed Elgohary | Sergey Feldman | Vu Ha | Rodney Kinney | Sebastian Kohlmeier | Kyle Lo | Tyler Murray | Hsu-Han Ooi | Matthew Peters | Joanna Power | Sam Skjonsberg | Lucy Wang | Chris Wilhelm | Zheng Yuan | Madeleine van Zuylen | Oren Etzioni
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.

2016

pdf bib
IKE - An Interactive Tool for Knowledge Extraction
Bhavana Dalvi | Sumithra Bhakthavatsalam | Chris Clark | Peter Clark | Oren Etzioni | Anthony Fader | Dirk Groeneveld
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

pdf bib
Exploring Markov Logic Networks for Question Answering
Tushar Khot | Niranjan Balasubramanian | Eric Gribkoff | Ashish Sabharwal | Peter Clark | Oren Etzioni
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Solving Geometry Problems: Combining Text and Diagram Interpretation
Minjoon Seo | Hannaneh Hajishirzi | Ali Farhadi | Oren Etzioni | Clint Malcolm
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Parsing Algebraic Word Problems into Equations
Rik Koncel-Kedziorski | Hannaneh Hajishirzi | Ashish Sabharwal | Oren Etzioni | Siena Dumas Ang
Transactions of the Association for Computational Linguistics, Volume 3

This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees. We use integer linear programming to generate equation trees and score their likelihood by learning local and global discriminative models. These models are trained on a small set of word problems and their answers, without any manual annotation, in order to choose the equation that best matches the problem text. We refer to the overall system as Alges. We compare Alges with previous work and show that it covers the full gamut of arithmetic operations whereas Hosseini et al. (2014) only handle addition and subtraction. In addition, Alges overcomes the brittleness of the Kushman et al. (2014) approach on single-equation problems, yielding a 15% to 50% reduction in error.

2014

pdf bib
Chinese Open Relation Extraction for Knowledge Acquisition
Yuen-Hsien Tseng | Lung-Hao Lee | Shu-Yen Lin | Bo-Shun Liao | Mei-Jun Liu | Hsin-Hsi Chen | Oren Etzioni | Anthony Fader
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Learning to Solve Arithmetic Word Problems with Verb Categorization
Mohammad Javad Hosseini | Hannaneh Hajishirzi | Oren Etzioni | Nate Kushman
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Generating Coherent Event Schemas at Scale
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter | Luke Zettlemoyer | Mausam | Oren Etzioni
Transactions of the Association for Computational Linguistics, Volume 1

Distant supervision algorithms learn information extraction models given only large readily available databases and text collections. Most previous work has used heuristics for generating labeled data, for example assuming that facts not contained in the database are not mentioned in the text, and facts in the database must be mentioned at least once. In this paper, we propose a new latent-variable approach that models missing data. This provides a natural way to incorporate side information, for instance modeling the intuition that text will often mention rare entities which are likely to be missing in the database. Despite the added complexity introduced by reasoning about missing data, we demonstrate that a carefully designed local search approach to inference is very accurate and scales to large datasets. Experiments demonstrate improved performance for binary and unary relation extraction when compared to learning with heuristic labels, including on average a 27% increase in area under the precision recall curve in the binary case.

pdf bib
Paraphrase-Driven Learning for Open Question Answering
Anthony Fader | Luke Zettlemoyer | Oren Etzioni
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Towards Coherent Multi-Document Summarization
Janara Christensen | Mausam | Stephen Soderland | Oren Etzioni
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Constructing a Textual KB from a Biology TextBook
Peter Clark | Phil Harrison | Niranjan Balasubramanian | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Entity Linking at Web Scale
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Rel-grams: A Probabilistic Model of Relations in Text
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Open Language Learning for Information Extraction
Mausam | Michael Schmitz | Stephen Soderland | Robert Bart | Oren Etzioni
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Named Entity Recognition in Tweets: An Experimental Study
Alan Ritter | Sam Clark | Mausam | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Relations for Open Information Extraction
Anthony Fader | Stephen Soderland | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Semantic Role Labeling for Open Information Extraction
Janara Christensen | Mausam | Stephen Soderland | Oren Etzioni
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf bib
Machine Reading at the University of Washington
Hoifung Poon | Janara Christensen | Pedro Domingos | Oren Etzioni | Raphael Hoffmann | Chloe Kiddon | Thomas Lin | Xiao Ling | Mausam | Alan Ritter | Stefan Schoenmackers | Stephen Soderland | Dan Weld | Fei Wu | Congle Zhang
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf bib
A Latent Dirichlet Allocation Method for Selectional Preferences
Alan Ritter | Mausam | Oren Etzioni
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Extracting Sequences from the Web
Anthony Fader | Stephen Soderland | Oren Etzioni
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Learning First-Order Horn Clauses from Web Text
Stefan Schoenmackers | Jesse Davis | Oren Etzioni | Daniel Weld
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Functional Relations in Web Text
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Compiling a Massive, Multilingual Dictionary via Probabilistic Inference
Mausam | Stephen Soderland | Oren Etzioni | Daniel Weld | Michael Skinner | Jeff Bilmes
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
A Rose is a Roos is a Ruusu: Querying Translations for Web Image Search
Janara Christensen | Mausam | Oren Etzioni
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
The Tradeoffs Between Open and Traditional Relation Extraction
Michele Banko | Oren Etzioni
Proceedings of ACL-08: HLT

pdf bib
It’s a Contradiction – no, it’s not: A Case Study using Functional Relations
Alan Ritter | Stephen Soderland | Doug Downey | Oren Etzioni
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Scaling Textual Inference to the Web
Stefan Schoenmackers | Oren Etzioni | Daniel Weld
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Unsupervised Resolution of Objects and Relations on the Web
Alexander Yates | Oren Etzioni
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
TextRunner: Open Information Extraction on the Web
Alexander Yates | Michele Banko | Matthew Broadhead | Michael Cafarella | Oren Etzioni | Stephen Soderland
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

pdf bib
Sparse Information Extraction: Unsupervised Language Models to the Rescue
Doug Downey | Stefan Schoenmackers | Oren Etzioni
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Detecting Parser Errors Using Web-based Semantic Filters
Alexander Yates | Stefan Schoenmackers | Oren Etzioni
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
BE: A search engine for NLP research
Mike Cafarella | Oren Etzioni
Proceedings of the 2nd International Workshop on Web as Corpus

pdf bib
Expanding the Recall of Relation Extraction by Bootstrapping
Junji Tomita | Stephen Soderland | Oren Etzioni
Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)

2005

pdf bib
Extracting Product Features and Opinions from Reviews
Ana-Maria Popescu | Oren Etzioni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
KnowItNow: Fast, Scalable Information Extraction from the Web
Michael J. Cafarella | Doug Downey | Stephen Soderland | Oren Etzioni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
OPINE: Extracting Product Features and Opinions from Reviews
Ana-Maria Popescu | Bao Nguyen | Oren Etzioni
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
Ana-Maria Popescu | Alex Armanasu | Oren Etzioni | David Ko | Alexander Yates
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Search
Co-authors