Menno van Zaanen

Also published as: Menno van Zannen


2020

pdf bib
A Process-oriented Dataset of Revisions during Writing
Rianne Conijn | Emily Dux Speltz | Menno van Zaanen | Luuk Van Waes | Evgeny Chukharev-Hudilainen
Proceedings of the 12th Language Resources and Evaluation Conference

Revision plays a major role in writing and the analysis of writing processes. Revisions can be analyzed using a product-oriented approach (focusing on a finished product, the text that has been produced) or a process-oriented approach (focusing on the process that the writer followed to generate this product). Although several language resources exist for the product-oriented approach to revisions, there are hardly any resources available yet for an in-depth analysis of the process of revisions. Therefore, we provide an extensive dataset on revisions made during writing (accessible via https://hdl.handle.net/10411/VBDYGX). This dataset is based on keystroke data and eye tracking data of 65 students from a variety of backgrounds (undergraduate and graduate English as a first language and English as a second language students) and a variety of tasks (argumentative text and academic abstract). In total, 7,120 revisions were identified in the dataset. For each revision, 18 features have been manually annotated and 31 features have been automatically extracted. As a case study, we show two potential use cases of the dataset. In addition, future uses of the dataset are described.

pdf bib
Proceedings of the first workshop on Resources for African Indigenous Languages
Rooweither Mabuya | Phathutshedzo Ramukhadi | Mmasibidi Setaka | Valencia Wagner | Menno van Zaanen
Proceedings of the first workshop on Resources for African Indigenous Languages

2018

pdf bib
A Multilingual Wikified Data Set of Educational Material
Iris Hendrickx | Eirini Takoulidou | Thanasis Naskos | Katia Lida Kermanidis | Vilelmini Sosoni | Hugo de Vos | Maria Stasimioti | Menno van Zaanen | Panayota Georgakopoulou | Valia Kordoni | Maja Popovic | Markus Egg | Antal van den Bosch
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
Vilelmini Sosoni | Katia Lida Kermanidis | Maria Stasimioti | Thanasis Naskos | Eirini Takoulidou | Menno van Zaanen | Sheila Castilho | Panayota Georgakopoulou | Valia Kordoni | Markus Egg
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Improving Machine Translation of Educational Content via Crowdsourcing
Maximiliana Behnke | Antonio Valerio Miceli Barone | Rico Sennrich | Vilelmini Sosoni | Thanasis Naskos | Eirini Takoulidou | Maria Stasimioti | Menno van Zaanen | Sheila Castilho | Federico Gaspari | Panayota Georgakopoulou | Valia Kordoni | Markus Egg | Katia Lida Kermanidis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The Influence of Context on the Learning of Metrical Stress Systems Using Finite-State Machines
Cesko Voeten | Menno van Zaanen
Computational Linguistics, Volume 44, Issue 2 - June 2018

Languages vary in the way stress is assigned to syllables within words. This article investigates the learnability of stress systems in a wide range of languages. The stress systems can be described using finite-state automata with symbols indicating levels of stress (primary, secondary, or no stress). Finite-state automata have been the focus of research in the area of grammatical inference for some time now. It has been shown that finite-state machines are learnable from examples using state-merging. One such approach, which aims to learn k-testable languages, has been applied to stress systems with some success. The family of k-testable languages has been shown to be efficiently learnable (in polynomial time). Here, we extend this approach to k, l-local languages by taking not only left context, but also right context, into account. We consider empirical results testing the performance of our learner using various amounts of context (corresponding to varying definitions of phonological locality). Our results show that our approach of learning stress patterns using state-merging is more reliant on left context than on right context. Additionally, some stress systems fail to be learned by our learner using either the left-context k-testable or the left-and-right-context k, l-local learning system. A more complex merging strategy, and hence grammar representation, is required for these stress systems.

2016

pdf bib
TraMOOC (Translation for Massive Open Online Courses): providing reliable MT for MOOCs
Valia Kordoni | Lexi Birch | Ioana Buliga | Kostadin Cholakov | Markus Egg | Federico Gaspari | Yota Georgakopolou | Maria Gialama | Iris Hendrickx | Mitja Jermol | Katia Kermanidis | Joss Moorkens | Davor Orlic | Michael Papadopoulos | Maja Popović | Rico Sennrich | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Menno van Zaanen | Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

2015

pdf bib
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)
Ben Verhoeven | Walter Daelemans | Menno van Zaanen | Gerhard van Huyssteen
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)

pdf bib
Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch
Ben Verhoeven | Menno van Zaanen | Walter Daelemans | Gerhard van Huyssteen
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)

pdf bib
OpenSoNaR: user-driven development of the SoNaR corpus interfaces
Martin Reynaert | Matje van de Camp | Menno van Zaanen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis.
Menno van Zaanen | Gerhard van Huyssteen | Suzanne Aussems | Chris Emmery | Roald Eiselen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Compounding is very productive and leads to practical problems in developing machine translators and spelling checkers, as newly formed compounds cannot be found in existing lexicons. The Automatic Compound Processing (AuCoPro) project deals with the analysis of compounds in two closely-related languages, Afrikaans and Dutch. In this paper, we present the development and evaluation of two datasets, one for each language, that contain compound words with annotated compound boundaries. Such datasets can be used to train classifiers to identify the compound components in novel compounds. We describe the process of annotation and provide an overview of the annotation guidelines as well as global properties of the datasets. The inter-rater agreements between the annotators are considered highly reliable. Furthermore, we show the usability of these datasets by building an initial automatic compound boundary detection system, which assigns compound boundaries with approximately 90% accuracy.

2011

pdf bib
Formal and Empirical Grammatical Inference
Jeffrey Heinz | Colin de la Higuera | Menno van Zannen
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2009

pdf bib
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Menno van Zaanen | Colin de la Higuera
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

pdf bib
Grammatical Inference and Computational Linguistics
Menno van Zaanen | Colin de la Higuera
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

pdf bib
Language Models for Contextual Error Detection and Correction
Herman Stehouwer | Menno van Zaanen
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

2007

pdf bib
Named Entity Recognition in Question Answering of Speech Data
Diego Mollá | Menno van Zaanen | Steve Cassidy
Proceedings of the Australasian Language Technology Workshop 2007

2006

pdf bib
Named Entity Recognition for Question Answering
Diego Mollá | Menno van Zaanen | Daniel Smith
Proceedings of the Australasian Language Technology Workshop 2006

2005

pdf bib
Proceedings of the Australasian Language Technology Workshop 2005
Timothy Baldwin | James Curran | Menno van Zaanen
Proceedings of the Australasian Language Technology Workshop 2005

pdf bib
Learning of Graph Rules for Question Answering
Diego Molla | Menno van Zaanen
Proceedings of the Australasian Language Technology Workshop 2005

2000

pdf bib
ABL: Alignment-Based Learning
Menno van Zaanen
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics