David Alfter


2020

pdf bib
Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Ildikó Pilan | Herbert Lange | Lars Borin
Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning

pdf bib
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications
Johannes Graën | David Alfter | Gerold Schneider
Proceedings of the 12th Language Resources and Evaluation Conference

The Common European Framework of Reference for Languages (CEFR) defines six levels of learner proficiency, and links them to particular communicative abilities. The CEFRLex project aims at compiling lexical resources that link single words and multi-word expressions to particular CEFR levels. The resources are thought to reflect second language learner needs as they are compiled from CEFR-graded textbooks and other learner-directed texts. In this work, we investigate the applicability of CEFRLex resources for building language learning applications. Our main concerns were that vocabulary in language learning materials might be sparse, i.e. that not all vocabulary items that belong to a particular level would also occur in materials for that level, and, on the other hand, that vocabulary items might be used on lower-level materials if required by the topic (e.g. with a simpler paraphrasing or translation). Our results indicate that the English CEFRLex resource is in accordance with external resources that we jointly employ as gold standard. Together with other values obtained from monolingual and parallel corpora, we can indicate which entries need to be adjusted to obtain values that are even more in line with this gold standard. We expect that this finding also holds for the other languages

2019

pdf bib
Interconnecting lexical resources and word alignment: How do learners get on with particle verbs?
David Alfter | Johannes Graën
Proceedings of the 22nd Nordic Conference on Computational Linguistics

In this paper, we present a prototype for an online exercise aimed at learners of English and Swedish that serves multiple purposes. The exercise allows learners of the aforementioned languages to train their knowledge of particle verbs receiving clues from the exercise application. The user themselves decide which clue to receive and pay in virtual currency for each, which provides us with valuable information about the utility of the clues that we provide as well as the learners willingness to trade virtual currency versus accuracy of their choice. As resources, we use list with annotated levels from the proficiency scale defined by the Common European Framework of Reference (CEFR) and a multilingual corpus with syntactic dependency relations and word annotation for all language pairs. From the latter resource, we extract translation equivalents for particle verb construction together with a list of parallel corpus examples that can be used as clues in the exercise.

pdf bib
LEGATO: A flexible lexicographic annotation tool
David Alfter | Therese Lindström Tiedemann | Elena Volodina
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This article is a report from an ongoing project aiming at analyzing lexical and grammatical competences of Swedish as a Second language (L2). To facilitate lexical analysis, we need access to metalinguistic information about relevant vocabulary that L2 learners can use and understand. The focus of the current article is on the lexical annotation of the vocabulary scope for a range of lexicographical aspects, such as morphological analysis, valency, types of multi-word units, etc. We perform parts of the analysis automatically, and other parts manually. The rationale behind this is that where there is no possibility to add information automatically, manual effort needs to be added. To facilitate the latter, a tool LEGATO has been designed, implemented and currently put to active testing.

pdf bib
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Lars Borin | Ildikó Pilan | Herbert Lange
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

2018

pdf bib
Towards Single Word Lexical Complexity Prediction
David Alfter | Elena Volodina
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present work-in-progress where we investigate the usefulness of previously created word lists to the task of single-word lexical complexity analysis and prediction of the complexity level for learners of Swedish as a second language. The word lists used map each word to a single CEFR level, and the task consists of predicting CEFR levels for unseen words. In contrast to previous work on word-level lexical complexity, we experiment with topics as additional features and show that linking words to topics significantly increases accuracy of classification.

pdf bib
SB@GU at the Complex Word Identification 2018 Shared Task
David Alfter | Ildikó Pilán
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we describe our experiments for the Shared Task on Complex Word Identification (CWI) 2018 (Yimam et al., 2018), hosted by the 13th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at NAACL 2018. Our system for English builds on previous work for Swedish concerning the classification of words into proficiency levels. We investigate different features for English and compare their usefulness using feature selection methods. For the German, Spanish and French data we use simple systems based on character n-gram models and show that sometimes simple models achieve comparable results to fully feature-engineered systems.

pdf bib
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
Ildikó Pilán | Elena Volodina | David Alfter | Lars Borin
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning

2016

pdf bib
Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings
Ildikó Pilán | David Alfter | Elena Volodina
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

We bring together knowledge from two different types of language learning data, texts learners read and texts they write, to improve linguistic complexity classification in the latter. Linguistic complexity in the foreign and second language learning context can be expressed in terms of proficiency levels. We show that incorporating features capturing lexical complexity information from reading passages can boost significantly the machine learning based classification of learner-written texts into proficiency levels. With an F1 score of .8 our system rivals state-of-the-art results reported for other languages for this task. Finally, we present a freely available web-based tool for proficiency level classification and lexical complexity visualization for both learner writings and reading texts.

pdf bib
From distributions to labels: A lexical proficiency analysis using learner corpora
David Alfter | Yuri Bizzoni | Anders Agebjörn | Elena Volodina | Ildikó Pilán
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

2014

pdf bib
A Dictionary Data Processing Environment and Its Application in Algorithmic Processing of Pali Dictionary Data for Future NLP Tasks
Jürgen Knauth | David Alfter
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing