Arne Jönsson

Also published as: Arne Jonsson


2020

pdf bib
Visualizing Facets of Text Complexity across Registers
Marina Santini | Arne Jonsson | Evelina Rennes
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

In this paper, we propose visualizing results of a corpus-based study on text complexity using radar charts. We argue that the added value of this type of visualisation is the polygonal shape that provides an intuitive grasp of text complexity similarities across the registers of a corpus. The results that we visualize come from a study where we explored whether it is possible to automatically single out different facets of text complexity across the registers of a Swedish corpus. To this end, we used factor analysis as applied in Biber’s Multi-Dimensional Analysis framework. The visualization of text complexity facets with radar charts indicates that there is correspondence between linguistic similarity and similarity of shape across registers.

2019

pdf bib
Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language
Marina Santini | Benjamin Danielsson | Arne Jönsson
Proceedings of the 22nd Nordic Conference on Computational Linguistics

We explore the effectiveness of four feature representations – bag-of-words, word embeddings, principal components and autoencoders – for the binary categorization of the easy-to-read variety vs standard language. Standard language refers to the ordinary language variety used by a population as a whole or by a community, while the “easy-to-read” variety is a simpler (or a simplified) version of the standard language. We test the efficiency of these feature representations on three corpora, which differ in size, class balance, unit of analysis, language and topic. We rely on supervised and unsupervised machine learning algorithms. Results show that bag-of-words is a robust and straightforward feature representation for this task and performs well in many experimental settings. Its performance is equivalent or equal to the performance achieved with principal components and autoencorders, whose preprocessing is however more time-consuming. Word embeddings are less accurate than the other feature representations for this classification task.

2018

bib
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)
Arne Jönsson | Evelina Rennes | Horacio Saggion | Sanja Stajner | Victoria Yaneva
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)

2017

pdf bib
Services for text simplification and analysis
Johan Falkenjack | Evelina Rennes | Daniel Fahlborg | Vida Johansson | Arne Jönsson
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
Implicit readability ranking using the latent variable of a Bayesian Probit model
Johan Falkenjack | Arne Jönsson
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.

pdf bib
Similarity-Based Alignment of Monolingual Corpora for Text Simplification Purposes
Sarah Albertsson | Evelina Rennes | Arne Jönsson
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Comparable or parallel corpora are beneficial for many NLP tasks. The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus. The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was developed, and the methods were compared. For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.

2015

pdf bib
A multivariate model for classifying texts’ readability
Katarina Heimann Mühlenbock | Sofie Johansson Kokkinakis | Caroline Liberg | Åsa af Geijerstam | Jenny Wiksten Folkeryd | Arne Jönsson | Erik Kanebrant | Johan Falkenjack
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
A Tool for Automatic Simplification of Swedish Texts
Evelina Rennes | Arne Jönsson
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib
Classifying easy-to-read texts without parsing
Johan Falkenjack | Arne Jönsson
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
The Impact of Cohesion Errors in Extraction Based Summaries
Evelina Rennes | Arne Jönsson
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present results from an eye tracking study of automatic text summarization. Automatic text summarization is a growing field due to the modern world’s Internet based society, but to automatically create perfect summaries is challenging. One problem is that extraction based summaries often have cohesion errors. By the usage of an eye tracking camera, we have studied the nature of four different types of cohesion errors occurring in extraction based summaries. A total of 23 participants read and rated four different texts and marked the most difficult areas of each text. Statistical analysis of the data revealed that absent cohesion or context and broken anaphoric reference (pronouns) caused some disturbance in reading, but that the impact is restricted to the effort to read rather than the comprehension of the text. However, erroneous anaphoric references (pronouns) were not always detected by the participants which poses a problem for automatic text summarizers. The study also revealed other potential disturbing factors.

2013

pdf bib
Iterative Development and Evaluation of a Social Conversational Agent
Annika Silvervarg | Arne Jönsson
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Features Indicating Readability in Swedish Text
Johan Falkenjack | Katarina Heimann Mühlenbock | Arne Jönsson
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
A good space: Lexical predictors in word space evaluation
Christian Smith | Henrik Danielsson | Arne Jönsson
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated the effect on summary quality when using various language resources to train a vector space based extraction summarizer. This is done by evaluating the performance of the summarizer utilizing vector spaces built from corpora from different genres, partitioned from the Swedish SUC-corpus. The corpora are also characterized using a variety of lexical measures commonly used in readability studies. The performance of the summarizer is measured by comparing automatically produced summaries to human created gold standard summaries using the ROUGE F-score. Our results show that the genre of the training corpus does not have a significant effect on summary quality. However, evaluating the variance in the F-score between the genres based on lexical measures as independent variables in a linear regression model, shows that vector spaces created from texts with high syntactic complexity, high word variation, short sentences and few long words produce better summaries.

pdf bib
This also affects the context - Errors in extraction based summaries
Thomas Kaspersson | Christian Smith | Henrik Danielsson | Arne Jönsson
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Although previous studies have shown that errors occur in texts summarized by extraction based summarizers, no study has investigated how common different types of errors are and how that changes with degree of summarization. We have conducted studies of errors in extraction based single document summaries using 30 texts, summarized to 5 different degrees and tagged for errors by human judges. The results show that the most common errors are absent cohesion or context and various types of broken or missing anaphoric references. The amount of errors is dependent on the degree of summarization where some error types have a linear relation to the degree of summarization and others have U-shaped or cut-off linear relations. These results show that the degree of summarization has to be taken into account to minimize the amount of errors by extraction based summarizers.

pdf bib
A More Cohesive Summarizer
Christian Smith | Henrik Danielsson | Arne Jönsson
Proceedings of COLING 2012: Posters

2011

pdf bib
Automatic summarization as means of simplifying texts, an evaluation for Swedish
Christian Smith | Arne Jönsson
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
Enhancing extraction based summarization with outside word space
Christian Smith | Arne Jönsson
Proceedings of 5th International Joint Conference on Natural Language Processing

2008

pdf bib
Using Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis
Linus Sellberg | Arne Jönsson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present results from using Random indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. In the paper we compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. Our results show that Latent Semantic Analysis on Random Indexing reduced matrices provide better results on Precision and Recall than Random Indexing only. Furthermore, computation time for Singular Value Decomposition on a Random indexing reduced matrix is almost halved compared to Latent Semantic Analysis.

2007

pdf bib
Interview and Delivery: Dialogue Strategies for Conversational Recommender Systems
Pontus Wärnestål | Lars Degerstedt | Arne Jönsson
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib
Emergent Conversational Recommendations: A Dialogue Behavior Approach
Pontus Wärnestal | Lars Degerstedt | Arne Jönsson
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2004

pdf bib
Open Resources for Language Technology
Lars Degerstedt | Arne Jönsson
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Some empirical findings on dialogue management and domain ontologies in dialogue systems - Implications from an evaluation of BirdQuest
Annika Flycht-Eriksson | Arne Jönsson
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

2001

pdf bib
Towards multimodal public information systems
Magnus Merkel | Arne Jönsson
Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001)

2000

pdf bib
Dialogue and Domain Knowledge Management in Dialogue Systems
Annika Flycht-Eriksson | Arne Jonsson
1st SIGdial Workshop on Discourse and Dialogue

pdf bib
Distilling dialogues - A method using natural dialogue corpora for dialogue systems development
Arne Jonsson | Nils Dahlback
Sixth Applied Natural Language Processing Conference

1998

pdf bib
Robust Interaction through Partial Interpretation and Dialogue Management
Arne Jonsson | Lena Stromback
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Robust Interaction through Partial Interpretation and Dialogue Management
Arne Jönsson | Lena Strömbäck
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1991

pdf bib
A Dialogue Manager Using Initiative-Response Units and Distributed Control
Arne Jonsson
Fifth Conference of the European Chapter of the Association for Computational Linguistics

1990

pdf bib
Application-Dependent Discourse Management for Natural Language Interfaces: An Empirical Investigation
Arne Jönsson
Proceedings of the 7th Nordic Conference of Computational Linguistics (NODALIDA 1989)

1989

pdf bib
Empirical Studies of Discourse Representations for Natural Language Interfaces
Nils Dählback | Arne Jonsson
Fourth Conference of the European Chapter of the Association for Computational Linguistics