Line-a-line: A Tool for Annotating Word-Alignments

Maria Skeppstedt, Magnus Ahltorp, Gunnar Eriksson, Rickard Domeij


Abstract
We here describe line-a-line, a web-based tool for manual annotation of word-alignments in sentence-aligned parallel corpora. The graphical user interface, which builds on a design template from the Jigsaw system for investigative analysis, displays the words from each sentence pair that is to be annotated as elements in two vertical lists. An alignment between two words is annotated by drag-and-drop, i.e. by dragging an element from the left-hand list and dropping it on an element in the right-hand list. The tool indicates that two words are aligned by lines that connect them and by highlighting associated words when the mouse is hovered over them. Line-a-line uses the efmaral library for producing pre-annotated alignments, on which the user can base the manual annotation. The tool is mainly planned to be used on moderately under-resourced languages, for which resources in the form of parallel corpora are scarce. The automatic word-alignment functionality therefore also incorporates information derived from non-parallel resources, in the form of pre-trained multilingual word embeddings from the MUSE library.
Anthology ID:
2020.bucc-1.1
Volume:
Proceedings of the 13th Workshop on Building and Using Comparable Corpora
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
BUCC | LREC | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1–5
Language:
English
URL:
https://www.aclweb.org/anthology/2020.bucc-1.1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.bucc-1.1.pdf