Filip Ginter


2020

pdf bib
The FISKMÖ Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research
Jörg Tiedemann | Tommi Nieminen | Mikko Aulamo | Jenna Kanerva | Akseli Leino | Filip Ginter | Niko Papula
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents FISKMÖ, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish. The goal of the project is the compilation of a massive parallel corpus out of translated material collected from web sources, public and private organisations and language service providers in Finland with its two official languages. The project also aims at the development of open and freely accessible translation services for those two languages for the general purpose and for domain-specific use. We have released new data sets with over 3 million translation units, a benchmark test set for MT development, pre-trained neural MT models with high coverage and competitive performance and a self-contained MT plugin for a popular CAT tool. The latter enables offline translation without dependencies on external services making it possible to work with highly sensitive data without compromising security concerns.

pdf bib
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
Joakim Nivre | Marie-Catherine de Marneffe | Filip Ginter | Jan Hajič | Christopher D. Manning | Sampo Pyysalo | Sebastian Schuster | Francis Tyers | Daniel Zeman
Proceedings of the 12th Language Resources and Evaluation Conference

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the universal guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.

pdf bib
Turku Enhanced Parser Pipeline: From Raw Text to Enhanced Graphs in the IWPT 2020 Shared Task
Jenna Kanerva | Filip Ginter | Sampo Pyysalo
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We present the approach of the TurkuNLP group to the IWPT 2020 shared task on Multilingual Parsing into Enhanced Universal Dependencies. The task involves 28 treebanks in 17 different languages and requires parsers to generate graph structures extending on the basic dependency trees. Our approach combines language-specific BERT models, the UDify parser, neural sequence-to-sequence lemmatization and a graph transformation approach encoding the enhanced structure into a dependency tree. Our submission averaged 84.5% ELAS, ranking first in the shared task. We make all methods and resources developed for this study freely available under open licenses from https://turkunlp.org.

2019

pdf bib
Neural Dependency Parsing of Biomedical Text: TurkuNLP entry in the CRAFT Structural Annotation Task
Thang Minh Ngo | Jenna Kanerva | Filip Ginter | Sampo Pyysalo
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

We present the approach taken by the TurkuNLP group in the CRAFT Structural Annotation task, a shared task on dependency parsing. Our approach builds primarily on the Turku neural parser, a native dependency parser that ranked among the best in the recent CoNLL tasks on parsing Universal Dependencies. To adapt the parser to the biomedical domain, we considered and evaluated a number of approaches, including the generation of custom word embeddings, combination with other in-domain resources, and the incorporation of information from named entity recognition. We achieved a labeled attachment score of 89.7%, the best result among task participants.

pdf bib
Template-free Data-to-Text Generation of Finnish Sports News
Jenna Kanerva | Samuel Rönnqvist | Riina Kekki | Tapio Salakoski | Filip Ginter
Proceedings of the 22nd Nordic Conference on Computational Linguistics

News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics. This poses a challenge for automated data-to-text news generation with real-world news corpora as training data. We report on the development of a corpus of Finnish ice hockey news, edited to be suitable for training of end-to-end news generation methods, as well as demonstrate generation of text, which was judged by journalists to be relatively close to a viable product. The new dataset and system source code are available for research purposes.

pdf bib
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing
Joakim Nivre | Leon Derczynski | Filip Ginter | Bjørn Lindi | Stephan Oepen | Anders Søgaard | Jörg Tidemann
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

pdf bib
Is Multilingual BERT Fluent in Language Generation?
Samuel Rönnqvist | Jenna Kanerva | Tapio Salakoski | Filip Ginter
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages. The code of the experiments in the paper is available at: https://github.com/TurkuNLP/bert-eval

2018

pdf bib
Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions
Kira Droganova | Daniel Zeman | Jenna Kanerva | Filip Ginter
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Jan Hajič | Martin Popel | Martin Potthast | Milan Straka | Filip Ginter | Joakim Nivre | Slav Petrov
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2018, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on test input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. This shared task constitutes a 2nd edition—the first one took place in 2017 (Zeman et al., 2017); the main metric from 2017 has been kept, allowing for easy comparison, also in 2018, and two new main metrics have been used. New datasets added to the Universal Dependencies collection between mid-2017 and the spring of 2018 have contributed to increased difficulty of the task this year. In this overview paper, we define the task and the updated evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task
Jenna Kanerva | Filip Ginter | Niko Miekka | Akseli Leino | Tapio Salakoski
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

In this paper we describe the TurkuNLP entry at the CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. Compared to the last year, this year the shared task includes two new main metrics to measure the morphological tagging and lemmatization accuracies in addition to syntactic trees. Basing our motivation into these new metrics, we developed an end-to-end parsing pipeline especially focusing on developing a novel and state-of-the-art component for lemmatization. Our system reached the highest aggregate ranking on three main metrics out of 26 teams by achieving 1st place on metric involving lemmatization, and 2nd on both morphological tagging and parsing.

pdf bib
Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network
Hans Moen | Kai Hakala | Laura-Maria Peltonen | Henry Suhonen | Petri Loukasmäki | Tapio Salakoski | Filip Ginter | Sanna Salanterä
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

We present our initial evaluation of a prototype system designed to assist nurses in assigning subject headings to nursing narratives – written in the context of documenting patient care in hospitals. Currently nurses may need to memorize several hundred subject headings from standardized nursing terminologies when structuring and assigning the right section/subject headings to their text. Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards. We hypothesize that this could reduce the time and effort needed for nursing documentation in hospitals. A central component of the system is a text classification model based on a long short-term memory (LSTM) recurrent neural network architecture, trained on a large data set of nursing notes. A simple Web-based interface has been implemented for user interaction. To evaluate the system, three nurses write a set of artificial nursing shift notes in a fully unstructured narrative manner, without planning for or consider the use of sections and subject headings. These are then fed to the system which assigns subject headings to each sentence and then groups them into paragraphs. Manual evaluation is conducted by a group of nurses. The results show that about 70% of the sentences are assigned to correct subject headings. The nurses believe that such a system can be of great help in making nursing documentation in hospitals easier and less time consuming. Finally, various measures and approaches for improving the system are discussed.

pdf bib
Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions
Kira Droganova | Filip Ginter | Jenna Kanerva | Daniel Zeman
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis. We report on several experiments in enrichment of training data for this specific construction, evaluated on five languages: Czech, English, Finnish, Russian and Slovak. These data enrichment methods draw upon self-training and tri-training, combined with a stratified sampling method mimicking the structural complexity of the original treebank. In addition, using these same methods, we also demonstrate small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions.

pdf bib
Enhancing Universal Dependency Treebanks: A Case Study
Joakim Nivre | Paola Marongiu | Filip Ginter | Jenna Kanerva | Simonetta Montemagni | Sebastian Schuster | Maria Simi
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. We apply a rule-based system developed for English and a data-driven system trained on Finnish to Swedish and Italian. We find that both systems are accurate enough to bootstrap enhanced dependencies in existing UD treebanks. In the case of Italian, results are even on par with those of a prototype language-specific system.

2017

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
TurkuNLP: Delexicalized Pre-training of Word Embeddings for Dependency Parsing
Jenna Kanerva | Juhani Luotolahti | Filip Ginter
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We present the TurkuNLP entry in the CoNLL 2017 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. The system is based on the UDPipe parser with our focus being in exploring various techniques to pre-train the word embeddings used by the parser in order to improve its performance especially on languages with small training sets. The system ranked 11th among the 33 participants overall, being 8th on the small treebanks, 10th on the large treebanks, 12th on the parallel test sets, and 26th on the surprise languages.

pdf bib
Universal Dependencies
Joakim Nivre | Daniel Zeman | Filip Ginter | Francis Tyers
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages. This tutorial gives an introduction to the UD framework and resources, from basic design principles to annotation guidelines and existing treebanks. We also discuss tools for developing and exploiting UD treebanks and survey applications of UD in NLP and linguistics.

pdf bib
Creating register sub-corpora for the Finnish Internet Parsebank
Veronika Laippala | Juhani Luotolahti | Aki-Juhani Kyröläinen | Tapio Salakoski | Filip Ginter
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Dep_search: Efficient Search Tool for Large Dependency Parsebanks
Juhani Luotolahti | Jenna Kanerva | Filip Ginter
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora
Aleksi Vesanto | Filip Ginter | Hannu Salmi | Asko Nivala | Tapio Salakoski
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910
Aleksi Vesanto | Asko Nivala | Heli Rantala | Tapio Salakoski | Hannu Salmi | Filip Ginter
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

pdf bib
End-to-End System for Bacteria Habitat Extraction
Farrokh Mehryary | Kai Hakala | Suwisa Kaewphan | Jari Björne | Tapio Salakoski | Filip Ginter
BioNLP 2017

We introduce an end-to-end system capable of named-entity detection, normalization and relation extraction for extracting information about bacteria and their habitats from biomedical literature. Our system is based on deep learning, CRF classifiers and vector space models. We train and evaluate the system on the BioNLP 2016 Shared Task Bacteria Biotope data. The official evaluation shows that the joint performance of our entity detection and relation extraction models outperforms the winning team of the Shared Task by 19pp on F1-score, establishing a new top score for the task. We also achieve state-of-the-art results in the normalization task. Our system is open source and freely available at https://github.com/TurkuNLP/BHE.

pdf bib
Detecting mentions of pain and acute confusion in Finnish clinical text
Hans Moen | Kai Hakala | Farrokh Mehryary | Laura-Maria Peltonen | Tapio Salakoski | Filip Ginter | Sanna Salanterä
BioNLP 2017

We study and compare two different approaches to the task of automatic assignment of predefined classes to clinical free-text narratives. In the first approach this is treated as a traditional mention-level named-entity recognition task, while the second approach treats it as a sentence-level multi-label classification task. Performance comparison across these two approaches is conducted in the form of sentence-level evaluation and state-of-the-art methods for both approaches are evaluated. The experiments are done on two data sets consisting of Finnish clinical text, manually annotated with respect to the topics pain and acute confusion. Our results suggest that the mention-level named-entity recognition approach outperforms sentence-level classification overall, but the latter approach still manages to achieve the best prediction scores on several annotation classes.

pdf bib
Cross-Lingual Pronoun Prediction with Deep Recurrent Neural Networks v2.0
Juhani Luotolahti | Jenna Kanerva | Filip Ginter
Proceedings of the Third Workshop on Discourse in Machine Translation

In this paper we present our system in the DiscoMT 2017 Shared Task on Crosslingual Pronoun Prediction. Our entry builds on our last year’s success, our system based on deep recurrent neural networks outperformed all the other systems with a clear margin. This year we investigate whether different pre-trained word embeddings can be used to improve the neural systems, and whether the recently published Gated Convolutions outperform the Gated Recurrent Units used last year.

pdf bib
Fully Delexicalized Contexts for Syntax-Based Word Embeddings
Jenna Kanerva | Sampo Pyysalo | Filip Ginter
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Assessing the Annotation Consistency of the Universal Dependencies Corpora
Marie-Catherine de Marneffe | Matias Grioni | Jenna Kanerva | Filip Ginter
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf bib
Universal Dependencies v1: A Multilingual Treebank Collection
Joakim Nivre | Marie-Catherine de Marneffe | Filip Ginter | Yoav Goldberg | Jan Hajič | Christopher D. Manning | Ryan McDonald | Slav Petrov | Sampo Pyysalo | Natalia Silveira | Reut Tsarfaty | Daniel Zeman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments. It is also useful for multilingual system development and comparative linguistic studies. Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. In this paper, we describe v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages.

pdf bib
Universal Dependencies for Persian
Mojgan Seraji | Filip Ginter | Joakim Nivre
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annotation scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD.

pdf bib
Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools
Jörg Tiedemann | Fabienne Cap | Jenna Kanerva | Filip Ginter | Sara Stymne | Robert Östling | Marion Weller-Di Marco
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Cross-Lingual Pronoun Prediction with Deep Recurrent Neural Networks
Juhani Luotolahti | Jenna Kanerva | Filip Ginter
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Syntactic analyses and named entity recognition for PubMed and PubMed Central — up-to-the-minute
Kai Hakala | Suwisa Kaewphan | Tapio Salakoski | Filip Ginter
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib
Deep Learning with Minimal Training Data: TurkuNLP Entry in the BioNLP Shared Task 2016
Farrokh Mehryary | Jari Björne | Sampo Pyysalo | Tapio Salakoski | Filip Ginter
Proceedings of the 4th BioNLP Shared Task Workshop

2015

pdf bib
Towards the Classification of the Finnish Internet Parsebank: Detecting Translations and Informality
Veronika Laippala | Jenna Kanerva | Anna Missilä | Sampo Pyysalo | Tapio Salakoski | Filip Ginter
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Sentence Compression For Automatic Subtitling
Juhani Luotolahti | Filip Ginter
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Universal Dependencies for Finnish
Sampo Pyysalo | Jenna Kanerva | Anna Missilä | Veronika Laippala | Filip Ginter
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Towards Universal Web Parsebanks
Juhani Luotolahti | Jenna Kanerva | Veronika Laippala | Sampo Pyysalo | Filip Ginter
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Morphological Segmentation and OPUS for Finnish-English Machine Translation
Jörg Tiedemann | Filip Ginter | Jenna Kanerva
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
SETS: Scalable and Efficient Tree Search in Dependency Graphs
Juhani Luotolahti | Jenna Kanerva | Sampo Pyysalo | Filip Ginter
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Sharing annotations better: RESTful Open Annotation
Sampo Pyysalo | Jorge Campos | Juan Miguel Cejuela | Filip Ginter | Kai Hakala | Chen Li | Pontus Stenetorp | Lars Juhl Jensen
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
Turku: Semantic Dependency Parsing as a Sequence Classification
Jenna Kanerva | Juhani Luotolahti | Filip Ginter
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Turku: Broad-Coverage Semantic Parsing with Rich Features
Jenna Kanerva | Juhani Luotolahti | Filip Ginter
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
UTU: Disease Mention Recognition and Normalization with CRFs and Vector Space Representations
Suwisa Kaewphan | Kai Hakala | Filip Ginter
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Care Episode Retrieval
Hans Moen | Erwin Marsi | Filip Ginter | Laura-Maria Murtola | Tapio Salakoski | Sanna Salanterä
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
Post-hoc Manipulations of Vector Space Models with Application to Semantic Role Labeling
Jenna Kanerva | Filip Ginter
Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)

pdf bib
Universal Stanford dependencies: A cross-linguistic typology
Marie-Catherine de Marneffe | Timothy Dozat | Natalia Silveira | Katri Haverinen | Filip Ginter | Joakim Nivre | Christopher D. Manning
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Revisiting the now de facto standard Stanford dependency representation, we propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. We suggest a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added. We emphasize the lexicalist stance of the Stanford Dependencies, which leads to a particular, partially new treatment of compounding, prepositions, and morphology. We show how existing dependency schemes for several languages map onto the universal taxonomy proposed here and close with consideration of practical implications of dependency representation choices for NLP applications, in particular parsing.

2013

pdf bib
Evaluating Large-scale Text Mining Applications Beyond the Traditional Numeric Performance Measures
Sofie Van Landeghem | Suwisa Kaewphan | Filip Ginter | Yves Van de Peer
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction
Kai Hakala | Sofie Van Landeghem | Tapio Salakoski | Yves Van de Peer | Filip Ginter
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Predicting Conjunct Propagation and Other Extended Stanford Dependencies
Jenna Nyblom | Samuel Kohonen | Katri Haverinen | Tapio Salakoski | Filip Ginter
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

pdf bib
Towards a Dependency-Based PropBank of General Finnish
Katri Haverinen | Veronika Laippala | Samuel Kohonen | Anna Missilä | Jenna Nyblom | Stina Ojala | Timo Viljanen | Tapio Salakoski | Filip Ginter
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
Building a Large Automatically Parsed Corpus of Finnish
Filip Ginter | Jenna Nyblom | Veronika Laippala | Samuel Kohonen | Katri Haverinen | Simo Vihjanen | Tapio Salakoski
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
Joint Morphological and Syntactic Analysis for Richly Inflected Languages
Bernd Bohnet | Joakim Nivre | Igor Boguslavsky | Richárd Farkas | Filip Ginter | Jan Hajič
Transactions of the Association for Computational Linguistics, Volume 1

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft lexical constraints and the use of word clusters to tackle the sparsity of lexical features. Evaluation on five morphologically rich languages (Czech, Finnish, German, Hungarian, and Russian) shows consistent improvements in both morphological and syntactic accuracy for joint prediction over a pipeline model, with further improvements thanks to lexical constraints and word clusters. The final results improve the state of the art in dependency parsing for all languages.

2012

pdf bib
PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations
Jari Björne | Sofie Van Landeghem | Sampo Pyysalo | Tomoko Ohta | Filip Ginter | Yves Van de Peer | Sophia Ananiadou | Tapio Salakoski
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions
Sofie Van Landeghem | Filip Ginter | Yves Van de Peer | Tapio Salakoski
Proceedings of BioNLP 2011 Workshop

2010

pdf bib
Dependency-Based PropBanking of Clinical Finnish
Katri Haverinen | Filip Ginter | Timo Viljanen | Veronika Laippala | Tapio Salakoski
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Scaling up Biomedical Event Extraction to the Entire PubMed
Jari Björne | Filip Ginter | Sampo Pyysalo | Jun’ichi Tsujii | Tapio Salakoski
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

2009

pdf bib
Extracting Complex Biological Events with Rich Graph-Based Feature Sets
Jari Björne | Juho Heimonen | Filip Ginter | Antti Airola | Tapio Pahikkala | Tapio Salakoski
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

pdf bib
Learning to Extract Biological Event and Relation Graphs
Jari Björne | Filip Ginter | Juho Heimonen | Sampo Pyysalo | Tapio Salakoski
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

pdf bib
Parsing Clinical Finnish: Experiments with Rule-Based and Statistical Dependency Parsers
Katri Haverinen | Filip Ginter | Veronika Laippala | Tapio Salakoski
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

2008

pdf bib
A Graph Kernel for Protein-Protein Interaction Extraction
Antti Airola | Sampo Pyysalo | Jari Björne | Tapio Pahikkala | Filip Ginter | Tapio Salakoski
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2007

pdf bib
On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA
Sampo Pyysalo | Filip Ginter | Veronika Laippala | Katri Haverinen | Juho Heimonen | Tapio Salakoski
Biological, translational, and clinical language processing

2006

pdf bib
A Probabilistic Search for the Best Solution Among Partially Completed Candidates
Filip Ginter | Aleksandr Mylläri | Tapio Salakoski
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

2004

pdf bib
Analysis of Link Grammar on Biomedical Dependency Corpus Targeted at Protein-Protein Interactions
Sampo Pyysalo | Filip Ginter | Tapio Pahikkala | Jorma Boberg | Jouni Järvinen | Tapio Salakoski | Jeppe Koivula
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

Search
Co-authors