Henrik Danielsson


2012

pdf bib
Eye Tracking as a Tool for Machine Translation Error Analysis
Sara Stymne | Henrik Danielsson | Sofia Bremin | Hongzhan Hu | Johanna Karlsson | Anna Prytz Lillkull | Martin Wester
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a preliminary study where we use eye tracking as a complement to machine translation (MT) error analysis, the task of identifying and classifying MT errors. We performed a user study where subjects read short texts translated by three MT systems and one human translation, while we gathered eye tracking data. The subjects were also asked comprehension questions about the text, and were asked to estimate the text quality. We found that there are a longer gaze time and a higher number of fixations on MT errors, than on correct parts. There are also differences in the gaze time of different error types, with word order errors having the longest gaze time. We also found correlations between eye tracking data and human estimates of text quality. Overall our study shows that eye tracking can give complementary information to error analysis, such as aiding in ranking error types for seriousness.

pdf bib
A good space: Lexical predictors in word space evaluation
Christian Smith | Henrik Danielsson | Arne Jönsson
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated the effect on summary quality when using various language resources to train a vector space based extraction summarizer. This is done by evaluating the performance of the summarizer utilizing vector spaces built from corpora from different genres, partitioned from the Swedish SUC-corpus. The corpora are also characterized using a variety of lexical measures commonly used in readability studies. The performance of the summarizer is measured by comparing automatically produced summaries to human created gold standard summaries using the ROUGE F-score. Our results show that the genre of the training corpus does not have a significant effect on summary quality. However, evaluating the variance in the F-score between the genres based on lexical measures as independent variables in a linear regression model, shows that vector spaces created from texts with high syntactic complexity, high word variation, short sentences and few long words produce better summaries.

pdf bib
This also affects the context - Errors in extraction based summaries
Thomas Kaspersson | Christian Smith | Henrik Danielsson | Arne Jönsson
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Although previous studies have shown that errors occur in texts summarized by extraction based summarizers, no study has investigated how common different types of errors are and how that changes with degree of summarization. We have conducted studies of errors in extraction based single document summaries using 30 texts, summarized to 5 different degrees and tagged for errors by human judges. The results show that the most common errors are absent cohesion or context and various types of broken or missing anaphoric references. The amount of errors is dependent on the degree of summarization where some error types have a linear relation to the degree of summarization and others have U-shaped or cut-off linear relations. These results show that the degree of summarization has to be taken into account to minimize the amount of errors by extraction based summarizers.

pdf bib
A More Cohesive Summarizer
Christian Smith | Henrik Danielsson | Arne Jönsson
Proceedings of COLING 2012: Posters