Jenna Kanerva


2020

pdf bib
The FISKMÖ Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research
Jörg Tiedemann | Tommi Nieminen | Mikko Aulamo | Jenna Kanerva | Akseli Leino | Filip Ginter | Niko Papula
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents FISKMÖ, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish. The goal of the project is the compilation of a massive parallel corpus out of translated material collected from web sources, public and private organisations and language service providers in Finland with its two official languages. The project also aims at the development of open and freely accessible translation services for those two languages for the general purpose and for domain-specific use. We have released new data sets with over 3 million translation units, a benchmark test set for MT development, pre-trained neural MT models with high coverage and competitive performance and a self-contained MT plugin for a popular CAT tool. The latter enables offline translation without dependencies on external services making it possible to work with highly sensitive data without compromising security concerns.

pdf bib
Turku Enhanced Parser Pipeline: From Raw Text to Enhanced Graphs in the IWPT 2020 Shared Task
Jenna Kanerva | Filip Ginter | Sampo Pyysalo
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We present the approach of the TurkuNLP group to the IWPT 2020 shared task on Multilingual Parsing into Enhanced Universal Dependencies. The task involves 28 treebanks in 17 different languages and requires parsers to generate graph structures extending on the basic dependency trees. Our approach combines language-specific BERT models, the UDify parser, neural sequence-to-sequence lemmatization and a graph transformation approach encoding the enhanced structure into a dependency tree. Our submission averaged 84.5% ELAS, ranking first in the shared task. We make all methods and resources developed for this study freely available under open licenses from https://turkunlp.org.

2019

pdf bib
Neural Dependency Parsing of Biomedical Text: TurkuNLP entry in the CRAFT Structural Annotation Task
Thang Minh Ngo | Jenna Kanerva | Filip Ginter | Sampo Pyysalo
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

We present the approach taken by the TurkuNLP group in the CRAFT Structural Annotation task, a shared task on dependency parsing. Our approach builds primarily on the Turku neural parser, a native dependency parser that ranked among the best in the recent CoNLL tasks on parsing Universal Dependencies. To adapt the parser to the biomedical domain, we considered and evaluated a number of approaches, including the generation of custom word embeddings, combination with other in-domain resources, and the incorporation of information from named entity recognition. We achieved a labeled attachment score of 89.7%, the best result among task participants.

pdf bib
Template-free Data-to-Text Generation of Finnish Sports News
Jenna Kanerva | Samuel Rönnqvist | Riina Kekki | Tapio Salakoski | Filip Ginter
Proceedings of the 22nd Nordic Conference on Computational Linguistics

News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics. This poses a challenge for automated data-to-text news generation with real-world news corpora as training data. We report on the development of a corpus of Finnish ice hockey news, edited to be suitable for training of end-to-end news generation methods, as well as demonstrate generation of text, which was judged by journalists to be relatively close to a viable product. The new dataset and system source code are available for research purposes.

pdf bib
Is Multilingual BERT Fluent in Language Generation?
Samuel Rönnqvist | Jenna Kanerva | Tapio Salakoski | Filip Ginter
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages. The code of the experiments in the paper is available at: https://github.com/TurkuNLP/bert-eval

2018

pdf bib
Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions
Kira Droganova | Daniel Zeman | Jenna Kanerva | Filip Ginter
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task
Jenna Kanerva | Filip Ginter | Niko Miekka | Akseli Leino | Tapio Salakoski
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

In this paper we describe the TurkuNLP entry at the CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. Compared to the last year, this year the shared task includes two new main metrics to measure the morphological tagging and lemmatization accuracies in addition to syntactic trees. Basing our motivation into these new metrics, we developed an end-to-end parsing pipeline especially focusing on developing a novel and state-of-the-art component for lemmatization. Our system reached the highest aggregate ranking on three main metrics out of 26 teams by achieving 1st place on metric involving lemmatization, and 2nd on both morphological tagging and parsing.

pdf bib
Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions
Kira Droganova | Filip Ginter | Jenna Kanerva | Daniel Zeman
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis. We report on several experiments in enrichment of training data for this specific construction, evaluated on five languages: Czech, English, Finnish, Russian and Slovak. These data enrichment methods draw upon self-training and tri-training, combined with a stratified sampling method mimicking the structural complexity of the original treebank. In addition, using these same methods, we also demonstrate small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions.

pdf bib
Enhancing Universal Dependency Treebanks: A Case Study
Joakim Nivre | Paola Marongiu | Filip Ginter | Jenna Kanerva | Simonetta Montemagni | Sebastian Schuster | Maria Simi
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. We apply a rule-based system developed for English and a data-driven system trained on Finnish to Swedish and Italian. We find that both systems are accurate enough to bootstrap enhanced dependencies in existing UD treebanks. In the case of Italian, results are even on par with those of a prototype language-specific system.

2017

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
TurkuNLP: Delexicalized Pre-training of Word Embeddings for Dependency Parsing
Jenna Kanerva | Juhani Luotolahti | Filip Ginter
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We present the TurkuNLP entry in the CoNLL 2017 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. The system is based on the UDPipe parser with our focus being in exploring various techniques to pre-train the word embeddings used by the parser in order to improve its performance especially on languages with small training sets. The system ranked 11th among the 33 participants overall, being 8th on the small treebanks, 10th on the large treebanks, 12th on the parallel test sets, and 26th on the surprise languages.

pdf bib
Dep_search: Efficient Search Tool for Large Dependency Parsebanks
Juhani Luotolahti | Jenna Kanerva | Filip Ginter
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Cross-Lingual Pronoun Prediction with Deep Recurrent Neural Networks v2.0
Juhani Luotolahti | Jenna Kanerva | Filip Ginter
Proceedings of the Third Workshop on Discourse in Machine Translation

In this paper we present our system in the DiscoMT 2017 Shared Task on Crosslingual Pronoun Prediction. Our entry builds on our last year’s success, our system based on deep recurrent neural networks outperformed all the other systems with a clear margin. This year we investigate whether different pre-trained word embeddings can be used to improve the neural systems, and whether the recently published Gated Convolutions outperform the Gated Recurrent Units used last year.

pdf bib
Fully Delexicalized Contexts for Syntax-Based Word Embeddings
Jenna Kanerva | Sampo Pyysalo | Filip Ginter
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Assessing the Annotation Consistency of the Universal Dependencies Corpora
Marie-Catherine de Marneffe | Matias Grioni | Jenna Kanerva | Filip Ginter
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf bib
Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools
Jörg Tiedemann | Fabienne Cap | Jenna Kanerva | Filip Ginter | Sara Stymne | Robert Östling | Marion Weller-Di Marco
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Cross-Lingual Pronoun Prediction with Deep Recurrent Neural Networks
Juhani Luotolahti | Jenna Kanerva | Filip Ginter
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Towards the Classification of the Finnish Internet Parsebank: Detecting Translations and Informality
Veronika Laippala | Jenna Kanerva | Anna Missilä | Sampo Pyysalo | Tapio Salakoski | Filip Ginter
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Universal Dependencies for Finnish
Sampo Pyysalo | Jenna Kanerva | Anna Missilä | Veronika Laippala | Filip Ginter
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Towards Universal Web Parsebanks
Juhani Luotolahti | Jenna Kanerva | Veronika Laippala | Sampo Pyysalo | Filip Ginter
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Morphological Segmentation and OPUS for Finnish-English Machine Translation
Jörg Tiedemann | Filip Ginter | Jenna Kanerva
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
SETS: Scalable and Efficient Tree Search in Dependency Graphs
Juhani Luotolahti | Jenna Kanerva | Sampo Pyysalo | Filip Ginter
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Turku: Semantic Dependency Parsing as a Sequence Classification
Jenna Kanerva | Juhani Luotolahti | Filip Ginter
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Turku: Broad-Coverage Semantic Parsing with Rich Features
Jenna Kanerva | Juhani Luotolahti | Filip Ginter
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Post-hoc Manipulations of Vector Space Models with Application to Semantic Role Labeling
Jenna Kanerva | Filip Ginter
Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)