Nancy Underwood

Also published as: Nancy L. Underwood


2014

pdf bib
Evaluating the effects of interactivity in a post-editing workbench
Nancy Underwood | Bartolomé Mesa-Lao | Mercedes García Martínez | Michael Carl | Vicent Alabau | Jesús González-Rubio | Luis A. Leiva | Germán Sanchis-Trilles | Daniel Ortíz-Martínez | Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors’ gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.

2006

pdf bib
A Model for Context-Based Evaluation of Language Processing Systems and its Application to Machine Translation Evaluation
Andrei Popescu-Belis | Paula Estrella | Margaret King | Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we propose a formal framework that takes into account the influence of the intended context of use of an NLP system on the procedure and the metrics used to evaluate the system. We introduce in particular the notion of a context-dependent quality model and explain how it can be adapted to a given context of use. More specifically, we define vector-space representations of contexts of use and of quality models, which are connected by a generic contextual quality model (GCQM). For each domain, experts in evaluation are needed to build a GCQM based on analytic knowledge and on previous evaluations, using the mechanism proposed here. The main inspiration source for this work is the FEMTI framework for the evaluation of machine translation, which implements partly the present model, and which is described briefly along with insights from other domains.

pdf bib
ROTE: A Tool to Support Users in Defining the Relative Importance of Quality Characteristics
Agnes Lisowska | Nancy L. Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the Relative Ordering Tool for Evaluation (ROTE) which is designed to support the process of building a parameterised quality model for evaluation. It is a very simple tool which enables users to specify the relative importance of quality characteristics (and associated metrics) to reflect the users' particular requirements. The tool allows users to order any number of quality characteristics by comparing them in a pair-wise fashion. The tool was developed in the context of a collaborative project developing a text mining system. A full scale evaluation of the text mining system was designed and executed for three different users and the ROTE tool was successfully applied by those users during that process. The tool will be made available for general use by the evaluation community.

pdf bib
Evaluating Symbiotic Systems: the challenge
Margaret King | Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper looks at a class of systems which pose severe problems in evaluation design for current conventional approaches to evaluation. After describing the two conventional evaluation paradigms: the “functionality paradigm” as typified by evaluation campaigns and the ISO inspired “user-centred” paradigm typified by the work of the EAGLES and ISLE projects, it goes on to outline the problems posed by the evaluation of systems which are designed to work in critical interaction with a human expert user and to work over vast amounts of data. These systems pose problems for both paradigms although for different reasons. The primary aim of this paper is to provoke discussion and the search for solutions. We have no proven solutions at present. However, we describe a programme of exploratory research on which we have already embarked, which involves ground clearing work which we expect to result in a deep understanding of the systems and users, a pre-requisite for developing a general framework for evaluation in this field.

pdf bib
The Evolution of an Evaluation Framework for a Text Mining System
Nancy L. Underwood | Agnes Lisowska
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The Parmenides project developed a text mining application applied in three different domains exemplified by case studies for the three user partners in the project. During the lifetime of the project (and in parallel with the development of the system itself) an evaluation framework was developed by the authors in conjunction with the users, and was eventually applied to the system. The object of the exercise was two-fold: firstly to develop and perform a complete user-centered evaluation of the system to assess how well it answered the users' requirements and, secondly, to develop a general framework which could be applied in the context of other users' requirements and (with some modification) to similar systems. In this paper we describe not only the framework but the process of building and parameterising the quality model for each case study and, perhaps most interestingly, the way in which the quality model and users' requirements and expectations evolved over time.