Fredrik Olsson


pdf bib
Text Categorization for Conflict Event Annotation
Fredrik Olsson | Magnus Sahlgren | Fehmi ben Abdesslem | Ariel Ekgren | Kristine Eck
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020

We cast the problem of event annotation as one of text categorization, and compare state of the art text categorization techniques on event data produced within the Uppsala Conflict Data Program (UCDP). Annotating a single text involves assigning the labels pertaining to at least 17 distinct categorization tasks, e.g., who were the attacking organization, who was attacked, and where did the event take place. The text categorization techniques under scrutiny are a classical Bag-of-Words approach; character-based contextualized embeddings produced by ELMo; embeddings produced by the BERT base model, and a version of BERT base fine-tuned on UCDP data; and a pre-trained and fine-tuned classifier based on ULMFiT. The categorization tasks are very diverse in terms of the number of classes to predict as well as the skeweness of the distribution of classes. The categorization results exhibit a large variability across tasks, ranging from 30.3% to 99.8% F-score.


pdf bib
Gender Bias in Pretrained Swedish Embeddings
Magnus Sahlgren | Fredrik Olsson
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper investigates the presence of gender bias in pretrained Swedish embeddings. We focus on a scenario where names are matched with occupations, and we demonstrate how a number of standard pretrained embeddings handle this task. Our experiments show some significant differences between the pretrained embeddings, with word-based methods showing the most bias and contextualized language models showing the least. We also demonstrate that the previously proposed debiasing method does not affect the performance of the various embeddings in this scenario.


pdf bib
Learning Representations for Detecting Abusive Language
Magnus Sahlgren | Tim Isbister | Fredrik Olsson
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific n-grams. Our experiments show that learned representations do contain useful information that can be used to improve detection performance when training data is limited.


pdf bib
The Gavagai Living Lexicon
Magnus Sahlgren | Amaru Cuba Gyllensten | Fredrik Espinoza | Ola Hamfors | Jussi Karlgren | Fredrik Olsson | Per Persson | Akshay Viswanathan | Anders Holst
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the Gavagai Living Lexicon, which is an online distributional semantic model currently available in 20 different languages. We describe the underlying distributional semantic model, and how we have solved some of the challenges in applying such a model to large amounts of streaming data. We also describe the architecture of our implementation, and discuss how we deal with continuous quality assurance of the lexicon.


pdf bib
Methods for Amharic Part-of-Speech Tagging
Björn Gambäck | Fredrik Olsson | Atelach Alemu Argaw | Lars Asker
Proceedings of the First Workshop on Language Technologies for African Languages

pdf bib
An Intrinsic Stopping Criterion for Committee-Based Active Learning
Fredrik Olsson | Katrin Tomanek
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
A Web Survey on the Use of Active Learning to Support Annotation of Text Data
Katrin Tomanek | Fredrik Olsson
Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing


pdf bib
Notions of Correctness when Evaluating Protein Name Taggers
Fredrik Olsson | Gunnar Eriksson | Kristofer Franzén | Lars Asker | Per Lidén
COLING 2002: The 19th International Conference on Computational Linguistics


pdf bib
Experiences of Language Engineering Algorithm Reuse
Björn Gambäck | Fredrik Olsson
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Composing a General-Purpose Toolbox for Swedish
Fredrik Olsson | Björn Gambäck
Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems