Dayne Freitag

Also published as: D. Freitag


2020

pdf bib
Proceedings of the First Workshop on Scholarly Document Processing
Muthu Kumar Chandrasekaran | Anita de Waard | Guy Feigenblat | Dayne Freitag | Tirthankar Ghosal | Eduard Hovy | Petr Knoth | David Konopnicki | Philipp Mayr | Robert M. Patton | Michal Shmueli-Scheuer
Proceedings of the First Workshop on Scholarly Document Processing

pdf bib
Overview of the First Workshop on Scholarly Document Processing (SDP)
Muthu Kumar Chandrasekaran | Guy Feigenblat | Dayne Freitag | Tirthankar Ghosal | Eduard Hovy | Philipp Mayr | Michal Shmueli-Scheuer | Anita de Waard
Proceedings of the First Workshop on Scholarly Document Processing

Next to keeping up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseudo-science and disinformation. To address these challenges, computational work on enhancing search, summarization, and analysis of scholarly documents has flourished. However, the various strands of research on scholarly document processing remain fragmented. To reach to the broader NLP and AI/ML community, pool distributed efforts and enable shared access to published research, we held the 1st Workshop on Scholarly Document Processing at EMNLP 2020 as a virtual event. The SDP workshop consisted of a research track (including a poster session), two invited talks and three Shared Tasks (CL-SciSumm, Lay-Summ and LongSumm), geared towards easier access to scientific methods and results. Website: https://ornlcda.github.io/SDProc

2017

pdf bib
Discourse-Wide Extraction of Assay Frames from the Biological Literature
Dayne Freitag | Paul Kalmar | Eric Yeh
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017

We consider the problem of populating multi-part knowledge frames from textual information distributed over multiple sentences in a document. We present a corpus constructed by aligning papers from the cellular signaling literature to a collection of approximately 50,000 reference frames curated by hand as part of a decade-long project. We present and evaluate two approaches to the challenging problem of reconstructing these frames, which formalize biological assays described in the literature. One approach is based on classifying candidate records nominated by sentence-local entity co-occurrence. In the second approach, we introduce a novel virtual register machine traverses an article and generates frames, trained on our reference data. Our evaluations show that success in the task ultimately hinges on an integration of evidence spread across the discourse.

2016

pdf bib
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
Eric Yeh | John Niekrasz | Dayne Freitag | Richard Rohwer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of “naturally” occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.

pdf bib
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
Dayne Freitag | John Niekrasz
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2009

pdf bib
Name Transliteration with Bidirectional Perceptron Edit Models
Dayne Freitag | Zhiqiang Wang
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Loss-Sensitive Discriminative Training of Machine Transliteration Models
Kedar Bellare | Koby Crammer | Dayne Freitag
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

2008

pdf bib
Improving NER in Arabic Using a Morphological Tagger
Benjamin Farber | Dayne Freitag | Nizar Habash | Owen Rambow
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer. Surprisingly, the relevant features used are the capitalization of the English gloss chosen by the tagger, and the fact that an analysis is returned (that a word is not OOV to the morphological analyzer). The use of the tagger also improves over a third system which just uses a morphological analyzer, yielding a 14\% reduction in error over the baseline. We conduct a thorough error analysis to identify sources of success and failure among the variations, and show that by combining the systems in simple ways we can significantly influence the precision-recall trade-off.

2007

pdf bib
A Sequence Alignment Model Based on the Averaged Perceptron
Dayne Freitag | Shahram Khadivi
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2005

pdf bib
New Experiments in Distributional Representations of Synonymy
Dayne Freitag | Matthias Blume | John Byrnes | Edmond Chow | Sadik Kapadia | Richard Rohwer | Zhiqiang Wang
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

pdf bib
Morphology Induction from Term Clusters
Dayne Freitag
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
Toward Unsupervised Whole-Corpus Tagging
Dayne Freitag
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
A Critical Survey of the Methodology for IE Evaluation
A. Lavelli | M. E. Califf | F. Ciravegna | D. Freitag | C. Giuliano | N. Kushmerick | L. Romano
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Issues specific to IE evaluation include: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an information extraction task, a number of characteristics should be clearly defined. However, in the papers only a few of them are usually explicitly specified. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. The goal is to reach a widespread agreement on such proposal so that future IE evaluations will adopt the proposed methodology, making comparisons between algorithms fair and reliable. In order to achieve this goal, we will develop and make available to the community a set of tools and resources that incorporate a standardized IE methodology.

pdf bib
Towards Full Automation of Lexicon Construction
Richard Rohwer | Dayne Freitag
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

pdf bib
Trained Named Entity Recognition using Distributional Clusters
Dayne Freitag
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

1998

pdf bib
Toward General-Purpose Learning for Information Extraction
Dayne Freitag
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Toward General-Purpose Learning for Information Extraction
Dayne Freitag
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1