Elma Kerz


2020

pdf bib
Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity Contours
Elma Kerz | Fabio Pruneri | Daniel Wiechmann | Yu Qiao | Marcus Ströbel
Proceedings of the 12th Language Resources and Evaluation Conference

The purpose of this paper is twofold: [1] to introduce, to our knowledge, the largest available resource of keystroke logging (KSL) data generated by Etherpad (https://etherpad.org/), an open-source, web-based collaborative real-time editor, that captures the dynamics of second language (L2) production and [2] to relate the behavioral data from KSL to indices of syntactic and lexical complexity of the texts produced obtained from a tool that implements a sliding window approach capturing the progression of complexity within a text. We present the procedures and measures developed to analyze a sample of 14,913,009 keystrokes in 3,454 texts produced by 512 university students (upper-intermediate to advanced L2 learners of English) (95,354 sentences and 18,32,027 words) aiming to achieve a better alignment between keystroke-logging measures and underlying cognitive processes, on the one hand, and L2 writing performance measures, on the other hand. The resource introduced in this paper is a reflection of increasing recognition of the urgent need to obtain ecologically valid data that have the potential to transform our current understanding of mechanisms underlying the development of literacy (reading and writing) skills.

pdf bib
Becoming Linguistically Mature: Modeling English and German Children’s Writing Development Across School Grades
Elma Kerz | Yu Qiao | Daniel Wiechmann | Marcus Ströbel
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we employ a novel approach to advancing our understanding of the development of writing in English and German children across school grades using classification tasks. The data used come from two recently compiled corpora: The English data come from the the GiC corpus (983 school children in second-, sixth-, ninth- and eleventh-grade) and the German data are from the FD-LEX corpus (930 school children in fifth- and ninth-grade). The key to this paper is the combined use of what we refer to as ‘complexity contours’, i.e. series of measurements that capture the progression of linguistic complexity within a text, and Recurrent Neural Network (RNN) classifiers that adequately capture the sequential information in those contours. Our experiments demonstrate that RNN classifiers trained on complexity contours achieve higher classification accuracy than one trained on text-average complexity scores. In a second step, we determine the relative importance of the features from four distinct categories through a Sensitivity-Based Pruning approach.

pdf bib
A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN
Yu Qiao | Daniel Wiechmann | Elma Kerz
Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM)

‘Fake news’ – succinctly defined as false or misleading information masquerading as legitimate news – is a ubiquitous phenomenon and its dissemination weakens the fact-based reporting of the established news industry, making it harder for political actors, authorities, media and citizens to obtain a reliable picture. State-of-the art language-based approaches to fake news detection that reach high classification accuracy typically rely on black box models based on word embeddings. At the same time, there are increasing calls for moving away from black-box models towards white-box (explainable) models for critical industries such as healthcare, finances, military and news industry. In this paper we performed a series of experiments where bi-directional recurrent neural network classification models were trained on interpretable features derived from multi-disciplinary integrated approaches to language. We apply our approach to two benchmark datasets. We demonstrate that our approach is promising as it achieves similar results on these two datasets as the best performing black box models reported in the literature. In a second step we report on ablation experiments geared towards assessing the relative importance of the human-interpretable features in distinguishing fake news from real news.

2019

pdf bib
L2 Processing Advantages of Multiword Sequences: Evidence from Eye-Tracking
Elma Kerz | Arndt Heilmann | Stella Neumann
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

A substantial body of research has demonstrated that native speakers are sensitive to the frequencies of multiword sequences (MWS). Here, we ask whether and to what extent intermediate-advanced L2 speakers of English can also develop the sensitivity to the statistics of MWS. To this end, we aimed to replicate the MWS frequency effects found for adult native language speakers based on evidence from self-paced reading and sentence recall tasks in an ecologically more valid eye-tracking study. L2 speakers’ sensitivity to MWS frequency was evaluated using generalized linear mixed-effects regression with separate models fitted for each of the four dependent measures. Mixed-effects modeling revealed significantly faster processing of sentences containing MWS compared to sentences containing equivalent control items across all eyetracking measures. Taken together, these findings suggest that, in line with emergentist approaches, MWS are important building blocks of language and that similar mechanisms underlie both native and non-native language processing.

pdf bib
Understanding Vocabulary Growth Through An Adaptive Language Learning System
Elma Kerz | Andreas Burgdorf | Daniel Wiechmann | Stefan Meeger | Yu Qiao | Christian Kohlschein | Tobias Meisen
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

2016

pdf bib
CoCoGen - Complexity Contour Generator: Automatic Assessment of Linguistic Complexity Using a Sliding-Window Technique
Ströbel Marcus | Elma Kerz | Daniel Wiechmann | Stella Neumann
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

We present a novel approach to the automatic assessment of text complexity based on a sliding-window technique that tracks the distribution of complexity within a text. Such distribution is captured by what we term “complexity contours” derived from a series of measurements for a given linguistic complexity measure. This approach is implemented in an automatic computational tool, CoCoGen – Complexity Contour Generator, which in its current version supports 32 indices of linguistic complexity. The goal of the paper is twofold: (1) to introduce the design of our computational tool based on a sliding-window technique and (2) to showcase this approach in the area of second language (L2) learning, i.e. more specifically, in the area of L2 writing.

2014

pdf bib
Missing Generalizations: A Supervised Machine Learning Approach to L2 Written Production
Daniel Wiechmann | Elma Kerz
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)