A 2nd Longitudinal Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper describes the collection of the H1 Corpus of children’s weekly writing over the course of 3 months in 2nd and 3rd grades, aged 7-11. The texts were collected within the normal classroom setting by the teacher. Texts of children whose parents signed the permission to donate the texts to science were collected and transcribed. The corpus consists of the elicitation techniques, an overview of the data collected and the transcriptions of the texts both with and without spelling errors, aligned on a word by word basis, as well as the scanned in texts. The corpus is available for research via Linguistic Data Consortium (LDC). Researchers are strongly encouraged to make additional annotations and improvements and return it to the public domain via LDC.
A Database of Freely Written Texts of German School Students for the Purpose of Automatic Spelling Error Classification
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The spelling competence of school students is best measured on freely written texts, instead of pre-determined, dictated texts. Since the analysis of the error categories in these kinds of texts is very labor intensive and costly, we are working on an automatic systems to perform this task. The modules of the systems are derived from techniques from the area of natural language processing, and are learning systems that need large amounts of training data. To obtain the data necessary for training and evaluating the resulting system, we conducted data collection of freely written, German texts by school children. 1,730 students from grade 1 through 8 participated in this data collection. The data was transcribed electronically and annotated with their corrected version. This resulted in a total of 14,563 sentences that can now be used for research regarding spelling diagnostics. Additional meta-data was collected regarding writers’ language biography, teaching methodology, age, gender, and school year. In order to do a detailed manual annotation of the categories of the spelling errors committed by the students we developed a tool specifically tailored to the task.