Chris Welty

Also published as: Christopher Welty


Embedding Semantic Taxonomies
Alyssa Lees | Chris Welty | Shubin Zhao | Jacek Korycki | Sara Mc Carthy
Proceedings of the 28th International Conference on Computational Linguistics

A common step in developing an understanding of a vertical domain, e.g. shopping, dining, movies, medicine, etc., is curating a taxonomy of categories specific to the domain. These human created artifacts have been the subject of research in embeddings that attempt to encode aspects of the partial ordering property of taxonomies. We compare Box Embeddings, a natural containment representation of category taxonomies, to partial-order embeddings and a baseline Bayes Net, in the context of representing the Medical Subject Headings (MeSH) taxonomy given a set of 300K PubMed articles with subject labels from MeSH. We deeply explore the experimental properties of training box embeddings, including preparation of the training data, sampling ratios and class balance, initialization strategies, and propose a fix to the original box objective. We then present first results in using these techniques for representing a bipartite learning problem (i.e. collaborative filtering) in the presence of taxonomic relations within each partition, inferring disease (anatomical) locations from their use as subject labels in journal articles. Our box model substantially outperforms all baselines for taxonomic reconstruction and bipartite relationship experiments. This performance improvement is observed both in overall accuracy and the weighted spread by true taxonomic depth.


A Crowdsourced Frame Disambiguation Corpus with Ambiguity
Anca Dumitrache | Lora Aroyo | Chris Welty
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus. The annotations were collected using a novel crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. In contrast to the typical approach of attributing the best single frame to each word, we provide a list of frames with disagreement-based scores that express the confidence with which each frame applies to the word. This is based on the idea that inter-annotator disagreement is at least partly caused by ambiguity that is inherent to the text and frames. We have found many examples where the semantics of individual frames overlap sufficiently to make them acceptable alternatives for interpreting a sentence. We have argued that ignoring this ambiguity creates an overly arbitrary target for training and evaluating natural language processing systems - if humans cannot agree, why would we expect the correct answer from a machine to be any different? To process this data we also utilized an expanded lemma-set provided by the Framester system, which merges FN with WordNet to enhance coverage. Our dataset includes annotations of 1,000 sentence-word pairs whose lemmas are not part of FN. Finally we present metrics for evaluating frame disambiguation systems that account for ambiguity.


Crowdsourcing Semantic Label Propagation in Relation Classification
Anca Dumitrache | Lora Aroyo | Chris Welty
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

Distant supervision is a popular method for performing relation extraction from text that is known to produce noisy labels. Most progress in relation extraction and classification has been made with crowdsourced corrections to distant-supervised labels, and there is evidence that indicates still more would be better. In this paper, we explore the problem of propagating human annotation signals gathered for open-domain relation classification through the CrowdTruth methodology for crowdsourcing, that captures ambiguity in annotations by measuring inter-annotator disagreement. Our approach propagates annotations to sentences that are similar in a low dimensional embedding space, expanding the number of labels by two orders of magnitude. Our experiments show significant improvement in a sentence-level multi-class relation classifier.


Long-Distance Time-Event Relation Extraction
Alessandro Moschitti | Siddharth Patwardhan | Chris Welty
Proceedings of the Sixth International Joint Conference on Natural Language Processing


When Did that Happen? — Linking Events and Relations to Timestamps
Dirk Hovy | James Fan | Alfio Gliozzo | Siddharth Patwardhan | Christopher Welty
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics


Large Scale Relation Detection
Chris Welty | James Fan | David Gondek | Andrew Schlaikjer
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

Learning to Predict Readability using Diverse Linguistic Features
Rohit Kate | Xiaoqiang Luo | Siddharth Patwardhan | Martin Franz | Radu Florian | Raymond Mooney | Salim Roukos | Chris Welty
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)