Kathryn Conger


2020

pdf bib
The Russian PropBank
Sarah Moeller | Irina Wagner | Martha Palmer | Kathryn Conger | Skatje Myers
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents a proposition bank for Russian (RuPB), a resource for semantic role labeling (SRL). The motivating goal for this resource is to automatically project semantic role labels from English to Russian. This paper describes frame creation strategies, coverage, and the process of sense disambiguation. It discusses language-specific issues that complicated the process of building the PropBank and how these challenges were exploited as language-internal guidance for consistency and coherence.

2016

pdf bib
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
Xuansong Li | Martha Palmer | Nianwen Xue | Lance Ramshaw | Mohamed Maamouri | Ann Bies | Kathryn Conger | Stephen Grimes | Stephanie Strassel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

High accuracy for automated translation and information retrieval calls for linguistic annotations at various language levels. The plethora of informal internet content sparked the demand for porting state-of-art natural language processing (NLP) applications to new social media as well as diverse language adaptation. Effort launched by the BOLT (Broad Operational Language Translation) program at DARPA (Defense Advanced Research Projects Agency) successfully addressed the internet information with enhanced NLP systems. BOLT aims for automated translation and linguistic analysis for informal genres of text and speech in online and in-person communication. As a part of this program, the Linguistic Data Consortium (LDC) developed valuable linguistic resources in support of the training and evaluation of such new technologies. This paper focuses on methodologies, infrastructure, and procedure for developing linguistic annotation at various language levels, including Treebank (TB), word alignment (WA), PropBank (PB), and co-reference (CoRef). Inspired by the OntoNotes approach with adaptations to the tasks to reflect the goals and scope of the BOLT project, this effort has introduced more annotation types of informal and free-style genres in English, Chinese and Egyptian Arabic. The corpus produced is by far the largest multi-lingual, multi-level and multi-genre annotation corpus of informal text and speech.

pdf bib
A Corpus of Preposition Supersenses
Nathan Schneider | Jena D. Hwang | Vivek Srikumar | Meredith Green | Abhijit Suresh | Kathryn Conger | Tim O’Gorman | Martha Palmer
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

2014

pdf bib
PropBank: Semantics of New Predicate Types
Claire Bonial | Julia Bonn | Kathryn Conger | Jena D. Hwang | Martha Palmer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This research focuses on expanding PropBank, a corpus annotated with predicate argument structures, with new predicate types; namely, noun, adjective and complex predicates, such as Light Verb Constructions. This effort is in part inspired by a sister project to PropBank, the Abstract Meaning Representation project, which also attempts to capture “who is doing what to whom” in a sentence, but does so in a way that abstracts away from syntactic structures. For example, alternate realizations of a ‘destroying’ event in the form of either the verb ‘destroy’ or the noun ‘destruction’ would receive the same Abstract Meaning Representation. In order for PropBank to reach the same level of coverage and continue to serve as the bedrock for Abstract Meaning Representation, predicate types other than verbs, which have previously gone without annotation, must be annotated. This research describes the challenges therein, including the development of new annotation practices that walk the line between abstracting away from language-particular syntactic facts to explore deeper semantics, and maintaining the connection between semantics and syntactic structures that has proven to be very valuable for PropBank as a corpus of training data for Natural Language Processing applications.