Marcin Woliński


2018

pdf bib
A New Version of the Składnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
Marcin Woliński | Elżbieta Hajnicz | Tomasz Bartosiak
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918
Witold Kieraś | Marcin Woliński
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
The on-line version of Grammatical Dictionary of Polish
Marcin Woliński | Witold Kieraś
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the new online edition of a dictionary of Polish inflection ― the Grammatical Dictionary of Polish (http://sgjp.pl). The dictionary is interesting for several reasons: it is comprehensive (over 330,000 lexemes corresponding to almost 4,300,000 different textual words; 1116 handcrafted inflectional patterns), the inflection is presented in an explicit manner in the form of carefully designed tables, the user interface facilitates advanced queries by several features (lemmas, forms, applicable grammatical categories, types of inflection). Moreover, the data of the dictionary is used in morphological analysers, including our product Morfeusz (http://sgjp. pl/morfeusz). From the start, the dictionary was meant to be comfortable for the human reader as well as to be ready for use in NLP applications. In the paper we briefly discuss both aspects of the resource.

2014

pdf bib
Extended phraseological information in a valence dictionary for NLP applications
Adam Przepiórkowski | Elżbieta Hajnicz | Agnieszka Patejuk | Marcin Woliński
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

pdf bib
Walenty: Towards a comprehensive valence dictionary of Polish
Adam Przepiórkowski | Elżbieta Hajnicz | Agnieszka Patejuk | Marcin Woliński | Filip Skwarski | Marek Świdziński
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents Walenty, a comprehensive valence dictionary of Polish, with a number of novel features, as compared to other such dictionaries. The notion of argument is based on the coordination test and takes into consideration the possibility of diverse morphosyntactic realisations. Some aspects of the internal structure of phraseological (idiomatic) arguments are handled explicitly. While the current version of the dictionary concentrates on syntax, it already contains some semantic features, including semantically defined arguments, such as locative, temporal or manner, as well as control and raising, and work on extending it with semantic roles and selectional preferences is in progress. Although Walenty is still being intensively developed, it is already by far the largest Polish valence dictionary, with around 8600 verbal lemmata and almost 39 000 valence schemata. The dictionary is publicly available on the Creative Commons BY SA licence and may be downloaded from http://zil.ipipan.waw.pl/Walenty.

pdf bib
Morfeusz Reloaded
Marcin Woliński
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The paper presents recent developments in Morfeusz ― a morphological analyser for Polish. The program, being already a fundamental resource for processing Polish, has been reimplemented with some important changes in the tagset, some new options, added information on proper names, and ability to perform simple prefix derivation. The present version of Morfeusz (including its dictionaries) is made available under the very liberal 2-clause BSD license. The program can be downloaded from http://sgjp.pl/morfeusz/.

2013

pdf bib
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah | Reut Tsarfaty | Sandra Kübler | Marie Candito | Jinho D. Choi | Richárd Farkas | Jennifer Foster | Iakes Goenaga | Koldo Gojenola Galletebeitia | Yoav Goldberg | Spence Green | Nizar Habash | Marco Kuhlmann | Wolfgang Maier | Joakim Nivre | Adam Przepiórkowski | Ryan Roth | Wolfgang Seeker | Yannick Versley | Veronika Vincze | Marcin Woliński | Alina Wróblewska | Eric Villemonte de la Clergerie
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
PoliMorf: a (not so) new open morphological dictionary for Polish
Marcin Woliński | Marcin Miłkowski | Maciej Ogrodniczuk | Adam Przepiórkowski
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents preliminary results of an effort aiming at the creation of a morphological dictionary of Polish, PoliMorf, available under a very liberal BSD-style license. The dictionary is a result of a merger of two existing resources, SGJP and Morfologik and was prepared within the CESAR/META-NET initiative. The work completed so far includes re-licensing of the two dictionaries and filling the new resource with the morphological data semi-automatically unified from both sources. The merging process is controlled by the collaborative dictionary development web application Kuźnia, also implemented within the project. The tool involves several advanced features such as using SGJP inflectional patterns for form generation, possibility of attaching dictionary labels and classification schemes to lexemes, dictionary source record and change tracking. Since SGJP and Morfologik are already used in a significant number of Natural Language Processing projects in Poland, we expect PoliMorf to become the Polish morphological dictionary of choice for many years to come.

2004

pdf bib
A Search Tool for Corpora with Positional Tagsets and Ambiguities
Adam Przepiórkowski | Zygmunt Krynicki | Łukasz Dębowski | Marcin Woliński | Daniel Janus | Piotr Bański
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
The Unberable Lightness of Tagging* A Case Study in Morphosyntactic Tagging of Polish
Adam Przepiórkowski | Marcin Woliński
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

pdf bib
A Flexemic Tagset for Polish
Adam Przepiórkowski | Marcin Woliński
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages