Louise Guthrie

Also published as: L. Guthrie


2015

pdf bib
A New Dataset and Evaluation for Belief/Factuality
Vinodkumar Prabhakaran | Tomas By | Julia Hirschberg | Owen Rambow | Samira Shaikh | Tomek Strzalkowski | Jennifer Tracey | Michael Arrigo | Rupayan Basu | Micah Clark | Adam Dalton | Mona Diab | Louise Guthrie | Anna Prokofieva | Stephanie Strassel | Gregory Werner | Yorick Wilks | Janyce Wiebe
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2012

pdf bib
LIE: Leadership, Influence and Expertise
Roberta Catizone | Louise Guthrie | Arthur Thomas | Yorick Wilks
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes our research into methods for inferring social and instrumental roles and relationships from document and discourse corpora. The goal is to identify the roles of initial authors and participants in internet discussions with respect to leadership, influence and expertise. Web documents, forums and blogs provide data from which the relationships between these concepts are empirically derived and compared. Using techniques from Natural Language Processing (NLP), characterizations of authority and expertise are hypothesized and then tested to see if these pick out the same or different participants as may be chosen by techniques based on social network analysis (Huffaker 2010) see if they pick out the same discourse participants for any given level of these qualities (i.e. leadership, expertise and influence). Our methods could be applied, in principle, to any domain topic, but this paper will describe an initial investigation into two subject areas where a range of differing opinions are available and which differ in the nature of their appeals to authority and truth: ‘genetic engineering' and a ‘Muslim Forum'. The available online corpora for these topics contain discussions from a variety of users with different levels of expertise, backgrounds and personalities.

2010

pdf bib
Evaluation Metrics for the Lexical Substitution Task
Sanaz Jabbari | Mark Hepple | Louise Guthrie
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Evaluating Lexical Substitution: Analysis and New Measures
Sanaz Jabbari | Mark Hepple | Louise Guthrie
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Lexical substitution is the task of finding a replacement for a target word in a sentence so as to preserve, as closely as possible, the meaning of the original sentence. It has been proposed that lexical substitution be used as a basis for assessing the performance of word sense disambiguation systems, an idea realised in the English Lexical Substitution Task of SemEval-2007. In this paper, we examine the evaluation metrics used for the English Lexical Substitution Task and identify some problems that arise for them. We go on to propose some alternative measures for this purpose, that avoid these problems, and which in turn can be seen as redefining the key tasks that lexical substitution systems should be expected to perform. We hope that these new metrics will better serve to guide the development of lexical substitution systems in future work. One of the new metrics addresses how effective systems are in ranking substitution candidates, a key ability for lexical substitution systems, and we report some results concerning the assessment of systems produced by this measure as compared to the relevant measure from SemEval-2007.

2008

pdf bib
An Unsupervised Probabilistic Approach for the Detection of Outliers in Corpora
David Guthrie | Louise Guthrie | Yorick Wilks
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Many applications of computational linguistics are greatly influenced by the quality of corpora available and as automatically generated corpora continue to play an increasingly common role, it is essential that we not overlook the importance of well-constructed and homogeneous corpora. This paper describes an automatic approach to improving the homogeneity of corpora using an unsupervised method of statistical outlier detection to find documents and segments that do not belong in a corpus. We consider collections of corpora that are homogeneous with respect to topic (i.e. about the same subject), or genre (written for the same audience or from the same source) and use a combination of stylistic and lexical features of the texts to automatically identify pieces of text in these collections that break the homogeneity. These pieces of text that are significantly different from the rest of the corpus are likely to be errors that are out of place and should be removed from the corpus before it is used for other tasks. We evaluate our techniques by running extensive experiments over large artificially constructed corpora that each contain single pieces of text from a different topic, author, or genre than the rest of the collection and measure the accuracy of identifying these pieces of text without the use of training data. We show that when these pieces of text are reasonably large (1,000 words) we can reliably identify them in a corpus.

pdf bib
Authorship Attribution of E-Mail: Comparing Classifiers over a New Corpus for Evaluation
Ben Allison | Louise Guthrie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The release of the Enron corpus provided a unique resource for studying aspects of email use, because it is largely unfiltered, and therefore presents a relatively complete collection of emails for a reasonably large number of correspondents. This paper describes a newly created subcorpus of the Enron emails which we suggest can be used to test techniqes for authorship attribution, and further shows the application of three different classification methods to this task to present baseline results. Two of the classifiers used are are standard, and have been shown to perform well in the literature, and one of the classifiers is novel and based on concurrent work that proposes a Bayesian hierarchical distribution for word counts in documents. For each of the classifiers, we present results using six text representations, including use of linguistic structures derived from a parser as well as lexical information.

pdf bib
Unsupervised Learning-based Anomalous Arabic Text Detection
Nasser Abouzakhar | Ben Allison | Louise Guthrie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The growing dependence of modern society on the Web as a vital source of information and communication has become inevitable. However, the Web has become an ideal channel for various terrorist organisations to publish their misleading information and send unintelligible messages to communicate with their clients as well. The increase in the number of published anomalous misleading information on the Web has led to an increase in security threats. The existing Web security mechanisms and protocols are not appropriately designed to deal with such recently developed problems. Developing technology to detect anomalous textual information has become one of the major challenges within the NLP community. This paper introduces the problem of anomalous text detection by automatically extracting linguistic features from documents and evaluating those features for patterns of suspicious and/or inconsistent information in Arabic documents. In order to achieve that, we defined specific linguistic features that characterise various Arabic writing styles. Also, the paper introduces the main challenges in Arabic processing and describes the proposed unsupervised learning model for detecting anomalous Arabic textual information.

pdf bib
Professor or Screaming Beast? Detecting Anomalous Words in Chinese
Wei Liu | Ben Allison | Louise Guthrie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Internet has become the most popular platform for communication. However because most of the modern computer keyboard is Latin-based, Asian languages such as Chinese cannot input its characters (Hanzi) directly with these keyboards. As a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is the Pinyin input system. Pinyin is also called “Romanised” Chinese in that it phonetically resembles a Chinese character. Due to the highly ambiguous mapping from Pinyin to Chinese characters, word misuses can occur using standard computer keyboard, and more commonly so in internet chat-rooms or instant messengers where the language used is less formal. In this paper we aim to develop a system that can automatically identify such anomalies, whether they are simple typos or whether they are intentional. After identifying them, the system should suggest the correct word to be used.

pdf bib
Using a Probabilistic Model of Context to Detect Word Obfuscation
Sanaz Jabbari | Ben Allison | Louise Guthrie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes a distributional model of word use and word meaning which is derived purely from a body of text, and then applies this model to determine whether certain words are used in or out of context. We suggest that we can view the contexts of words as multinomially distributed random variables. We illustrate how using this basic idea, we can formulate the problem of detecting whether or not a word is used in context as a likelihood ratio test. We also define a measure of semantic relatedness between a word and its context using the same model. We assume that words that typically appear together are related, and thus have similar probability distributions and that words used in an unusual way will have probability distributions which are dissimilar from those of their surrounding context. The relatedness of a word to its context is based on Kullback-Leibler divergence between probability distributions assigned to the constituent words in the given sentence. We employed our methods on a defense-oriented application where certain words are substituted with other words in an intercepted communication.

2006

pdf bib
Towards the Orwellian Nightmare: Separation of Business and Personal Emails
Sanaz Jabbari | Ben Allison | David Guthrie | Louise Guthrie
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
A Closer Look at Skip-gram Modelling
David Guthrie | Ben Allison | Wei Liu | Louise Guthrie | Yorick Wilks
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams (a technique where by n-grams are still stored to model language, but they allow for tokens to be skipped) to overcome the data sparsity problem. We analyze this by computing all possible skip-grams in a training corpus and measure how many adjacent (standard) n-grams these cover in test documents. We examine skip-gram modelling using one to four skips with various amount of training data and test against similar documents as well as documents generated from a machine translation system. In this paper we also determine the amount of extra training data required to achieve skip-gram coverage using standard adjacent tri-grams.

2004

pdf bib
Large Scale Experiments for Semantic Labeling of Noun Phrases in Raw Text
Louise Guthrie | Roberto Basili | Fabio Zanzotto | Kalina Bontcheva | Hamish Cunningham | David Guthrie | Jia Cui | Marco Cammisa | Jerry Cheng-Chieh Liu | Cassia Farria Martin | Kristiyan Haralambiev | Martin Holub | Klaus Macherey | Fredrick Jelinek
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2001

pdf bib
Using HLT for Acquiring, Retrieving and Publishing Knowledge in AKT
Kalina Bontcheva | Christopher Brewster | Fabio Ciravegna | Hamish Cunningham | Louise Guthrie | Robert Gaizauskas | Yorick Wilks
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

1996

pdf bib
A Simple Probabilistic Approach to Classification and Routing
Louise Guthrie | James Leistensnider
TIPSTER TEXT PROGRAM PHASE II: Proceedings of a Workshop held at Vienna, Virginia, May 6-8, 1996

pdf bib
Integration of Document Detection and Information Extraction
Louise Guthrie | Tomek Strzalkowski | Jin Wang | Fang Lin
TIPSTER TEXT PROGRAM PHASE II: Proceedings of a Workshop held at Vienna, Virginia, May 6-8, 1996

1995

pdf bib
Lockheed Martin: LOUELLA PARSING, An NLToolset System for MUC-6
Lois Childs | Deb Brady | Louise Guthrie | Jose Franco | Dan Valdes-Dapena | Bill Reid | John Kielty | Glenn Dierkes | Ira Sider
Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995

1994

pdf bib
Noun Phrasal Entries in the EDR English Word Dictionary
A. Koizumi | M. Arioka | C. Harada | M. Sugimoto | L. Guthrie | C. Watts | R. Catizone | Y. Wilks
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
Document Classification by Machine:Theory and Practice
Louise Guthrie | Elbert Walker
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib
The Consortium for Lexical Research
Louise Guthrie
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1993

pdf bib
CRL/Brandeis: Description of the Diderot System as Used for MUC-5
Jim Cowie | Louise Guthrie | Jin Wang | Rong Wang | Takahiro Wakao | James Pustejovsky | Scott Waterman
Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993

pdf bib
CRL/Brandeis: The Diderot System
Jim Cowie | Louise Guthrie | Jin Wang | William Ogden | James Pustejovsky | Rong Wang | Takahiro Wakao | Scott Waterman | Yorick Wilks
TIPSTER TEXT PROGRAM: PHASE I: Proceedings of a Workshop held at Fredricksburg, Virginia, September 19-23, 1993

1992

pdf bib
CRL/NMSU and Brandeis MucBruce: MUC-4 Test Results and Analysis
Jim Cowie | Louise Guthrie | Yorick Wilks | James Pustejovsky
Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
CRL/NMSU and Brandeis: Description of the MucBruce System as Used for MUC-4
Jim Cowie | Louise Guthrie | Yorick Wilks
Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
Lexical Disambiguation using Simulated Annealing
Jim Cowie | Joe Guthrie | Louise Guthrie
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
Lexical Disambiguation using Simulated Annealing
Jim Cowie | Joe Guthrie | Louise Guthrie
COLING 1992 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
The Automatic Creation of Lexical Entries for a Multilingual MT System
David Farwell | Louise Guthrie | Yorick Wilks
COLING 1992 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib
Genus Disambiguation: A Study in Weighted Preference
Rebecca Bruce | Louise Guthrie
COLING 1992 Volume 4: The 15th International Conference on Computational Linguistics

1991

pdf bib
Subject-Dependent Co-Occurrence and Word Sense Disambiguation
Joe A. Guthriee | Louise Guthrie | Homa Aidinejad | Yorick Wilks
29th Annual Meeting of the Association for Computational Linguistics

1990

pdf bib
Is there content in empty heads?
Louise Guthrie | Brian M. Slator | Yorick Wilks | Rebecca Bruce
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics

1986

pdf bib
Parsing in Parallel
Xiuming Huang | Louise Guthrie
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics