L1-L2 Parallel Dependency Treebank as Learner Corpus
John Lee, Keying Li, Herman Leung
Abstract
This opinion paper proposes the use of parallel treebank as learner corpus. We show how an L1-L2 parallel treebank — i.e., parse trees of non-native sentences, aligned to the parse trees of their target hypotheses — can facilitate retrieval of sentences with specific learner errors. We argue for its benefits, in terms of corpus re-use and interoperability, over a conventional learner corpus annotated with error tags. As a proof of concept, we conduct a case study on word-order errors made by learners of Chinese as a foreign language. We report precision and recall in retrieving a range of word-order error categories from L1-L2 tree pairs annotated in the Universal Dependency framework.- Anthology ID:
- W17-6306
- Volume:
- Proceedings of the 15th International Conference on Parsing Technologies
- Month:
- September
- Year:
- 2017
- Address:
- Pisa, Italy
- Venues:
- IWPT | WS
- SIG:
- SIGPARSE
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 44–49
- Language:
- URL:
- https://www.aclweb.org/anthology/W17-6306
- DOI:
- PDF:
- http://aclanthology.lst.uni-saarland.de/W17-6306.pdf