Ann Copestake


2020

pdf bib
Morphologically Aware Word-Level Translation
Paula Czarnowska | Sebastian Ruder | Ryan Cotterell | Ann Copestake
Proceedings of the 28th International Conference on Computational Linguistics

We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way. Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning, while inflectional morphology provides additional syntactic information. This approach leads to substantial performance improvements—19% average improvement in accuracy across 6 language pairs over the state of the art in the supervised setting and 16% in the weakly supervised setting. As another contribution, we highlight issues associated with modern BLI that stem from ignoring inflectional morphology, and propose three suggestions for improving the task.

2019

pdf bib
Don’t Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction
Paula Czarnowska | Sebastian Ruder | Edouard Grave | Ryan Cotterell | Ann Copestake
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Human translators routinely have to translate rare inflections of words – due to the Zipfian distribution of words in a language. When translating from Spanish, a good translator would have no problem identifying the proper translation of a statistically rare inflection such as habláramos. Note the lexeme itself, hablar, is relatively common. In this work, we investigate whether state-of-the-art bilingual lexicon inducers are capable of learning this kind of generalization. We introduce 40 morphologically complete dictionaries in 10 languages and evaluate three of the best performing models on the task of translation of less frequent morphological forms. We demonstrate that the performance of state-of-the-art models drops considerably when evaluated on infrequent morphological inflections and then show that adding a simple morphological constraint at training time improves the performance, proving that the bilingual lexicon inducers can benefit from better encoding of morphology.

pdf bib
Words are Vectors, Dependencies are Matrices: Learning Word Embeddings from Dependency Graphs
Paula Czarnowska | Guy Emerson | Ann Copestake
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Distributional Semantic Models (DSMs) construct vector representations of word meanings based on their contexts. Typically, the contexts of a word are defined as its closest neighbours, but they can also be retrieved from its syntactic dependency relations. In this work, we propose a new dependency-based DSM. The novelty of our model lies in associating an independent meaning representation, a matrix, with each dependency-label. This allows it to capture specifics of the relations between words and contexts, leading to good performance on both intrinsic and extrinsic evaluation tasks. In addition to that, our model has an inherent ability to represent dependency chains as products of matrices which provides a straightforward way of handling further contexts of a word.

pdf bib
The Meaning of “Most” for Visual Question Answering Models
Alexander Kuhnle | Ann Copestake
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

The correct interpretation of quantifier statements in the context of a visual scene requires non-trivial inference mechanisms. For the example of “most”, we discuss two strategies which rely on fundamentally different cognitive concepts. Our aim is to identify what strategy deep learning models for visual question answering learn when trained on such questions. To this end, we carefully design data to replicate experiments from psycholinguistics where the same question was investigated for humans. Focusing on the FiLM visual question answering model, our experiments indicate that a form of approximate number system emerges whose performance declines with more difficult scenes as predicted by Weber’s law. Moreover, we identify confounding factors, like spatial arrangement of the scene, which impede the effectiveness of this system.

2018

pdf bib
Deep learning evaluation using deep linguistic processing
Alexander Kuhnle | Ann Copestake
Proceedings of the Workshop on Generalization in the Age of Deep Learning

We discuss problems with the standard approaches to evaluation for tasks like visual question answering, and argue that artificial data can be used to address these as a complement to current practice. We demonstrate that with the help of existing ‘deep’ linguistic processing technology we are able to create challenging abstract datasets, which enable us to investigate the language understanding abilities of multimodal deep learning models in detail, as compared to a single performance value on a static and monolithic dataset.

2017

pdf bib
Realization of long sentences using chunking
Ewa Muszyńska | Ann Copestake
Proceedings of the 10th International Conference on Natural Language Generation

We propose sentence chunking as a way to reduce the time and memory costs of realization of long sentences. During chunking we divide the semantic representation of a sentence into smaller components which can be processed and recombined without loss of information. Our meaning representation of choice is the Dependency Minimal Recursion Semantics (DMRS). We show that realizing chunks of a sentence and combining the results of such realizations increases the coverage for long sentences, significantly reduces the resources required and does not affect the quality of the realization.

pdf bib
Semantic Composition via Probabilistic Model Theory
Guy Emerson | Ann Copestake
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

2016

pdf bib
Resources for building applications with Dependency Minimal Recursion Semantics
Ann Copestake | Guy Emerson | Michael Wayne Goodman | Matic Horvat | Alexander Kuhnle | Ewa Muszyńska
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe resources aimed at increasing the usability of the semantic representations utilized within the DELPH-IN (Deep Linguistic Processing with HPSG) consortium. We concentrate in particular on the Dependency Minimal Recursion Semantics (DMRS) formalism, a graph-based representation designed for compositional semantic representation with deep grammars. Our main focus is on English, and specifically English Resource Semantics (ERS) as used in the English Resource Grammar. We first give an introduction to ERS and DMRS and a brief overview of some existing resources and then describe in detail a new repository which has been developed to simplify the use of ERS/DMRS. We explain a number of operations on DMRS graphs which our repository supports, with sketches of the algorithms, and illustrate how these operations can be exploited in application building. We believe that this work will aid researchers to exploit the rich and effective but complex DELPH-IN resources.

pdf bib
Functional Distributional Semantics
Guy Emerson | Ann Copestake
Proceedings of the 1st Workshop on Representation Learning for NLP

2015

pdf bib
Leveraging a Semantically Annotated Corpus to Disambiguate Prepositional Phrase Attachment
Guy Emerson | Ann Copestake
Proceedings of the 11th International Conference on Computational Semantics

pdf bib
Hierarchical Statistical Semantic Realization for Minimal Recursion Semantics
Matic Horvat | Ann Copestake | Bill Byrne
Proceedings of the 11th International Conference on Computational Semantics

pdf bib
Layers of Interpretation: On Grammar and Compositionality
Emily M. Bender | Dan Flickinger | Stephan Oepen | Woodley Packard | Ann Copestake
Proceedings of the 11th International Conference on Computational Semantics

2014

pdf bib
TagNText: A parallel corpus for the induction of resource-specific non-taxonomical relations from tagged images
Theodosia Togia | Ann Copestake
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

When producing textual descriptions, humans express propositions regarding an object; but what do they express when annotating a document with simple tags? To answer this question, we have studied what users of tagging systems would have said if they were to describe a resource with fully fledged text. In particular, our work attempts to answer the following questions: if users were to use full descriptions, would their current tags be words present in these hypothetical sentences? If yes, what kind of language would connect these words? Such questions, although central to the problem of extracting binary relations between tags, have been sidestepped in the existing literature, which has focused on a small subset of possible inter-tag relations, namely hierarchical ones (e.g. “car” --is-a-- “vehicle”), as opposed to non-taxonomical relations (e.g. “woman” --wears-- “hat”). TagNText is the first attempt to construct a parallel corpus of tags and textual descriptions with respect to particular resources. The corpus provides enough data for the researcher to gain an insight into the nature of underlying relations, as well as the tools and methodology for constructing larger-scale parallel corpora that can aid non-taxonomical relation extraction.

2013

pdf bib
Can distributional approaches improve on Good Old-Fashioned Lexical Semantics?
Ann Copestake
Proceedings of the IWCS 2013 Workshop Towards a Formal Distributional Semantics

2012

pdf bib
Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora
Carmen Dayrell | Arnaldo Candido Jr. | Gabriel Lima | Danilo Machado Jr. | Ann Copestake | Valéria Feltrim | Stella Tagnin | Sandra Aluisio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. This study focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of current machine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approach does not adequately reflect actual language use since a move can be realized by a clause, a sentence, or even several sentences. Here, we present MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifies rhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate. We have resorted to various other NLP tools and used two large training corpora: (i) one corpus consists of 645 abstracts from physical sciences and engineering (PE) and (ii) the other corpus is made up of 690 from life and health sciences (LH). This paper presents our preliminary results and also discusses the various challenges involved in multi-label tagging and works towards satisfactory solutions. In addition, we also make our two training corpora publicly available so that they may serve as benchmark for this new task.

2011

pdf bib
Formalising and specifying underquantification
Aurelie Herbelot | Ann Copestake
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Towards an on-demand Simple Portuguese Wikipedia
Arnaldo Candido Jr | Ann Copestake | Lucia Specia | Sandra Maria Aluísio
Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
Exciting and interesting: issues in the generation of binomials
Ann Copestake | Aurélie Herbelot
Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop

2010

pdf bib
Annotating Underquantification
Aurelie Herbelot | Ann Copestake
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf bib
Investigating Content Selection for Language Generation using Machine Learning
Colin Kelly | Ann Copestake | Nikiforos Karamanis
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf bib
Invited Talk: Slacker Semantics: Why Superficiality, Dependency and Avoidance of Commitment can be the Right Way to Go
Ann Copestake
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Using Lexical and Relational Similarity to Classify Semantic Relations
Diarmuid Ó Séaghdha | Ann Copestake
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Generating Research Websites Using Summarisation Techniques
Advaith Siddharthan | Ann Copestake
Proceedings of the ACL-08: HLT Demo Session

pdf bib
Language Resources and Chemical Informatics
C.J. Rupp | Ann Copestake | Peter Corbett | Peter Murray-Rust | Advaith Siddharthan | Simone Teufel | Benjamin Waldron
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Chemistry research papers are a primary source of information about chemistry, as in any scientific field. The presentation of the data is, predominantly, unstructured information, and so not immediately susceptible to processes developed within chemical informatics for carrying out chemistry research by information processing techniques. At one level, extracting the relevant information from research papers is a text mining task, requiring both extensive language resources and specialised knowledge of the subject domain. However, the papers also encode information about the way the research is conducted and the structure of the field itself. Applying language technology to research papers in chemistry can facilitate eScience on several different levels. The SciBorg project sets out to provide an extensive, analysed corpus of published chemistry research. This relies on the cooperation of several journal publishers to provide papers in an appropriate form. The work is carried out as a collaboration involving the Computer Laboratory, Chemistry Department and eScience Centre at Cambridge University, and is funded under the UK eScience programme.

pdf bib
Cascaded Classifiers for Confidence-Based Chemical Named Entity Recognition
Peter Corbett | Ann Copestake
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Johan Bos | Edward Briscoe | Aoife Cahill | John Carroll | Stephen Clark | Ann Copestake | Dan Flickinger | Josef van Genabith | Julia Hockenmaier | Aravind Joshi | Ronald Kaplan | Tracy Holloway King | Sandra Kuebler | Dekang Lin | Jan Tore Lønning | Christopher Manning | Yusuke Miyao | Joakim Nivre | Stephan Oepen | Kenji Sagae | Nianwen Xue | Yi Zhang
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

pdf bib
Semantic Classification with Distributional Kernels
Diarmuid Ó Séaghdha | Ann Copestake
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Co-occurrence Contexts for Noun Compound Interpretation
Diarmuid Ó Séaghdha | Ann Copestake
Proceedings of the Workshop on A Broader Perspective on Multiword Expressions

pdf bib
Semantic Composition with (Robust) Minimal Recursion Semantics
Ann Copestake
ACL 2007 Workshop on Deep Linguistic Processing

2006

pdf bib
Preprocessing and Tokenisation Standards in DELPH-IN Tools
Benjamin Waldron | Ann Copestake | Ulrich Schäfer | Bernd Kiefer
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We discuss preprocessing and tokenisation standards within DELPH-IN, a large scale open-source collaboration providing multiple independent multilingual shallow and deep processors. We discuss (i) a component-specific XML interface format which has been used for some time to interface preprocessor results to the PET parser, and (ii) our implementation of a more generic XML interface format influenced heavily by the (ISO working draft) Morphosyntactic Annotation Framework (MAF). Our generic format encapsulates the information which may be passed from the preprocessing stage to a parser: it uses standoff-annotation, a lattice for the representation of structural ambiguity, intra-annotation dependencies and allows for highly structured annotation content. This work builds on the existing Heart of Gold middleware system, and previous work on Robust Minimal Recursion Semantics (RMRS) as part of an inter-component interface. We give examples of usage with a number of the DELPH-IN processing components and deep grammars.

pdf bib
A Standoff Annotation Interface between DELPH-IN Components
Benjamin Waldron | Ann Copestake
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

pdf bib
Errors in wikis
Ann Copestake
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources

2004

pdf bib
Generating Referring Expressions in Open Domains
Advaith Siddharthan | Ann Copestake
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
A Lexicon Module for a Grammar Development Environment
Ann Copestake | Fabre Lambeau | Benjamin Waldron | Francis Bond | Dan Flickinger | Stephan Oepen
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Lexical Encoding of MWEs
Aline Villavicencio | Ann Copestake | Benjamin Waldron | Fabre Lambeau
Proceedings of the Workshop on Multiword Expressions: Integrating Processing

2003

bib
10th Conference of the European Chapter of the Association for Computational Linguistics
Ann Copestake | Jan Hajič
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Multiword expressions: linguistic precision and reusability
Ann Copestake | Fabre Lambeau | Aline Villavicencio | Francis Bond | Timothy Baldwin | Ivan A. Sag | Dan Flickinger
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
An Algebra for Semantic Construction in Constraint-based Grammars
Ann Copestake | Alex Lascarides | Dan Flickinger
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG
Ann Copestake | Dan Flickinger
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Memory-Based Learning for Article Generation
Guido Minnen | Francis Bond | Ann Copestake
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

1999

pdf bib
Default Representation in Constraint-based Frameworks
Alex Lascarides | Ann Copestake
Computational Linguistics, Volume 25, Number 1, March 1999

pdf bib
Lexical rules in constraint based grammars
Ted Briscoe | Ann Copestake
Computational Linguistics, Volume 25, Number 4, December 1999

1997

pdf bib
Augmented and alternative NLP techniques for augmentative and alternative communication
Ann Copestake
Natural Language Processing for Communication Aids

pdf bib
Intergrating Symbolic and Statistical Representations: The Lexicon Pragmatics Interface
Ann Copestake | Alex Lascarides
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf bib
Controlling the Application of Lexical Rules
Ted Briscoe | Ann Copestake
Breadth and Depth of Semantic Lexicons

1992

pdf bib
The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons
Ann Copestake
Third Conference on Applied Natural Language Processing

1991

pdf bib
Lexical Operations in a Unification-based Framework
Ann Copestake | Ted Briscoe
Lexical Semantics and Knowledge Representation

1990

pdf bib
Enjoy the Paper: Lexicology
Ted Briscoe | Ann Copestake | Bran Boguraev
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics