Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)

Kay Berkling


Abstract
This paper describes the collection of the H1 Corpus of children’s weekly writing over the course of 3 months in 2nd and 3rd grades, aged 7-11. The texts were collected within the normal classroom setting by the teacher. Texts of children whose parents signed the permission to donate the texts to science were collected and transcribed. The corpus consists of the elicitation techniques, an overview of the data collected and the transcriptions of the texts both with and without spelling errors, aligned on a word by word basis, as well as the scanned in texts. The corpus is available for research via Linguistic Data Consortium (LDC). Researchers are strongly encouraged to make additional annotations and improvements and return it to the public domain via LDC.
Anthology ID:
L16-1510
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3200–3206
Language:
URL:
https://www.aclweb.org/anthology/L16-1510
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/L16-1510.pdf