Maciej Piasecki


2020

pdf bib
Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations
Arkadiusz Janz | Łukasz Kopociński | Maciej Piasecki | Agnieszka Pluwak
Proceedings of the 12th Language Resources and Evaluation Conference

Relation Extraction is a fundamental NLP task. In this paper we investigate the impact of underlying text representation on the performance of neural classification models in the task of Brand-Product relation extraction. We also present the methodology of preparing annotated textual corpora for this task and we provide valuable insight into the properties of Brand-Product relations existing in textual corpora. The problem is approached from a practical angle of applications Relation Extraction in facilitating commercial Internet monitoring.

2019

pdf bib
Sparse Coding in Authorship Attribution for Polish Tweets
Piotr Grzybowski | Ewa Juralewicz | Maciej Piasecki
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The study explores application of a simple Convolutional Neural Network for the problem of authorship attribution of tweets written in Polish. In our solution we use two-step compression of tweets using Byte Pair Encoding algorithm and vectorisation as an input to the distributional model generated for the large corpus of Polish tweets by word2vec algorithm. Our method achieves results comparable to the state-of-the-art approaches for the similar task on English tweets and expresses a very good performance in the classification of Polish tweets. We tested the proposed method in relation to the number of authors and tweets per author. We also juxtaposed results for authors with different topic backgrounds against each other.

pdf bib
Word Sense Disambiguation based on Constrained Random Walks in Linked Semantic Networks
Arkadiusz Janz | Maciej Piasecki
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Word Sense Disambiguation remains a challenging NLP task. Due to the lack of annotated training data, especially for rare senses, the supervised approaches are usually designed for specific subdomains limited to a narrow subset of identified senses. Recent advances in this area have shown that knowledge-based approaches are more scalable and obtain more promising results in all-words WSD scenarios. In this work we present a faster WSD algorithm based on the Monte Carlo approximation of sense probabilities given a context using constrained random walks over linked semantic networks. We show that the local semantic relatedness is mostly sufficient to successfully identify correct senses when an extensive knowledge base and a proper weighting scheme are used. The proposed methods are evaluated on English (SenseEval, SemEval) and Polish (Składnica, KPWr) datasets.

pdf bib
Tagger for Polish Computer Mediated Communication Texts
Wiktor Walentynowicz | Maciej Piasecki | Marcin Oleksy
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper we present a morpho-syntactic tagger dedicated to Computer-mediated Communication texts in Polish. Its construction is based on an expanded RNN-based neural network adapted to the work on noisy texts. Among several techniques, the tagger utilises fastText embedding vectors, sequential character embedding vectors, and Brown clustering for the coarse-grained representation of sentence structures. In addition a set of manually written rules was proposed for post-processing. The system was trained to disambiguate descriptions of words in relation to Parts of Speech tags together with the full morphological information in terms of values for the different grammatical categories. We present also evaluation of several model variants on the gold standard annotated CMC data, comparison to the state-of-the-art taggers for Polish and error analysis. The proposed tagger shows significantly better results in this domain and demonstrates the viability of adaptation.

2018

pdf bib
Classifier-based Polarity Propagation in a WordNet
Jan Kocoń | Arkadiusz Janz | Maciej Piasecki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Graph-Based Approach to Recognizing CST Relations in Polish Texts
Paweł Kędzia | Maciej Piasecki | Arkadiusz Janz
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper presents an supervised approach to the recognition of Cross-document Structure Theory (CST) relations in Polish texts. In the proposed, graph-based representation is constructed for sentences. Graphs are built on the basis of lexicalised syntactic-semantic relation extracted from text. Similarity between sentences is calculated from graph, and the similarity values are input to classifiers trained by Logistic Model Tree. Several different configurations of graph, as well as graph similarity methods were analysed for this tasks. The approach was evaluated on a large open corpus annotated manually with 17 types of selected CST relations. The configuration of experiments was similar to those known from SEMEVAL and we obtained very promising results.

pdf bib
Recognition of Genuine Polish Suicide Notes
Maciej Piasecki | Ksenia Młynarczyk | Jan Kocoń
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this article we present the result of the recent research in the recognition of genuine Polish suicide notes (SNs). We provide useful method to distinguish between SNs and other types of discourse, including counterfeited SNs. The method uses a wide range of word-based and semantic features and it was evaluated using Polish Corpus of Suicide Notes, which contains 1244 genuine SNs, expanded with manually prepared set of 334 counterfeited SNs and 2200 letter-like texts from the Internet. We utilized the algorithm to create the class-related sense dictionaries to improve the result of SNs classification. The obtained results show that there are fundamental differences between genuine SNs and counterfeited SNs. The applied method of the sense dictionary construction appeared to be the best way of improving the model.

2016

pdf bib
plWordNet 3.0 – a Comprehensive Lexical-Semantic Resource
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz | Paweł Kędzia
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected in wordnets – richly interrelated synsets – it contains sentiment and emotion annotations, a large set of multi-word expressions, and a mapping onto WordNet 3.1. Part of the release is enWordNet 1.0, a substantially enlarged copy of WordNet 3.1, with material added to allow for a more complete mapping. The paper discusses the design principles of plWordNet, its content, its statistical portrait, a comparison with similar resources, and a partial list of applications.

2015

pdf bib
A Procedural Definition of Multi-word Lexical Units
Marek Maziarz | Stan Szpakowicz | Maciej Piasecki
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Extraction of the Multi-word Lexical Units in the Perspective of the Wordnet Expansion
Maciej Piasecki | Michał Wendelberger | Marek Maziarz
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
A Large Wordnet-based Sentiment Lexicon for Polish
Monika Zaśko-Zielińska | Maciej Piasecki | Stan Szpakowicz
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

pdf bib
plWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Registers in the System of Semantic Relations in plWordNet
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Ruled-based, Interlingual Motivated Mapping of plWordNet onto SUMO Ontology
Paweł Kędzia | Maciej Piasecki
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we study a rule-based approach to mapping plWordNet onto SUMO Upper Ontology on the basis of the already existing mappings: plWordNet -- the Princeton WordNet -- SUMO. Information acquired from the inter-lingual relations between plWordNet and Princeton WordNet and the relations between Princeton WordNet and SUMO ontology are used in the proposed rules. Several mapping rules together with the matching examples are presented. The automated mapping results were evaluated in two steps, (i) we automatically checked formal correctness of the mappings for the pairs of plWordNet synset and SUMO concept, (ii) a subset of 160 mapping examples was manually checked by two+one linguists. We analyzed types of the mapping errors and their causes. The proposed rules expressed very high precision, especially when the errors in the resources are taken into account. Because both wordnets were constructed independently and as a result the obtained rules are not trivial and they reveal the differences between both wordnets and both languages.

2013

pdf bib
Evaluation of baseline information retrieval for Polish open-domain Question Answering system
Michał Marcińczuk | Adam Radziszewski | Maciej Piasecki | Dominik Piasecki | Marcin Ptak
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Beyond the Transfer-and-Merge Wordnet Construction: plWordNet and a Comparison with WordNet
Marek Maziarz | Maciej Piasecki | Ewa Rudnicka | Stan Szpakowicz
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Information Spreading in Expanding Wordnet Hypernymy Structure
Maciej Piasecki | Radosław Ramocki | Michał Kaliński
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf bib
Tools for plWordNet Development. Presentation and Perspectives
Bartosz Broda | Marek Maziarz | Maciej Piasecki
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Building a wordnet is a serious undertaking. Fortunately, Language Technology (LT) can improve the process of wordnet construction both in terms of quality and cost. In this paper we present LT tools used during the construction of plWordNet and their influence on the lexicographer's work-flow. LT is employed in plWordNet development on every possible step: from data gathering through data analysis to data presentation. Nevertheless, every decision requires input from the lexicographer, but the quality of supporting tools is an important factor. Thus a limited evaluation of usefulness of employed tools is carried out on the basis of questionnaires.

pdf bib
Recognition of Polish Derivational Relations Based on Supervised Learning Scheme
Maciej Piasecki | Radoslaw Ramocki | Marek Maziarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper presents construction of \emph{Derywator} -- a language tool for the recognition of Polish derivational relations. It was built on the basis of machine learning in a way following the bootstrapping approach: a limited set of derivational pairs described manually by linguists in plWordNet is used to train \emph{Derivator}. The tool is intended to be applied in semi-automated expansion of plWordNet with new instances of derivational relations. The training process is based on the construction of two transducers working in the opposite directions: one for prefixes and one for suffixes. Internal stem alternations are recognised, recorded in a form of mapping sequences and stored together with transducers. Raw results produced by \emph{Derivator} undergo next corpus-based and morphological filtering. A set of derivational relations defined in plWordNet is presented. Results of tests for different derivational relations are discussed. A problem of the necessary corpus-based semantic filtering is analysed. The presented tool depends to a very little extent on the hand-crafted knowledge for a particular language, namely only a table of possible alternations and morphological filtering rules must be exchanged and it should not take longer than a couple of working days.

pdf bib
Constraint Based Description of Polish Multiword Expressions
Roman Kurc | Maciej Piasecki | Bartosz Broda
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present an approach to the description of Polish Multi-word Expressions (MWEs) which is based on expressions in the WCCL language of morpho-syntactic constraints instead of grammar rules or transducers. For each MWE its basic morphological form and the base forms of its constituents are specified but also each MWE is assigned to a class on the basis of its syntactic structure. For each class a WCCL constraint is defined which is parametrised by string variables referring to MWE constituent base forms or inflected forms. The constraint specifies a minimal set of conditions that must be fulfilled in order to recognise an occurrence of the given MWE in text with high accuracy. Our formalism is focused on the efficient description of large MWE lexicons for the needs of utilisation in text processing. The formalism allows for the relatively easy representation of flexible word order and discontinuous constructions. Moreover, there is no necessity for the full specification of the MWE grammatical structure. Only some aspects of the particular MWE structure can be selected in way facilitating the target accuracy of recognition. On the basis of a set of simple heuristics, WCCL-based representation of MWEs can be automatically generated from a list of MWE base forms. The proposed representation was applied on a practical scale for the description of a large set of Polish MWEs included in plWordNet.

pdf bib
A Strategy of Mapping Polish WordNet onto Princeton WordNet
Ewa Rudnicka | Marek Maziarz | Maciej Piasecki | Stan Szpakowicz
Proceedings of COLING 2012: Posters

2010

pdf bib
Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure
Peter Wittenburg | Nuria Bel | Lars Borin | Gerhard Budin | Nicoletta Calzolari | Eva Hajicova | Kimmo Koskenniemi | Lothar Lemnitzer | Bente Maegaard | Maciej Piasecki | Jean-Marie Pierrel | Stelios Piperidis | Inguna Skadina | Dan Tufis | Remco van Veenendaal | Tamas Váradi | Martin Wynne
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.

pdf bib
Building a Node of the Accessible Language Technology Infrastructure
Bartosz Broda | Michał Marcińczuk | Maciej Piasecki
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A limited prototype of the CLARIN Language Technology Infrastructure (LTI) node is presented. The node prototype provides several types of web services for Polish. The functionality encompasses morpho-syntactic processing, shallow semantic processing of corpus on the basis of the SuperMatrix system and plWordNet browsing. We take the prototype as the starting point for the discussion on requirements that must be fulfilled by the LTI. Some possible solutions are proposed for less frequently discussed problems, e.g. streaming processing of language data on the remote processing node. We experimentally investigate how to tackle with several requirements from many discussed. Such aspects as processing large volumes of data, asynchronous mode of processing and scalability of the architecture to large number of users got especial attention in the constructed prototype of the Web Service for morpho-syntactic processing of Polish called TaKIPI-WS (http://plwordnet.pwr.wroc.pl/clarin/ws/takipi/). TaKIPI-WS is a distributed system with a three-layer architecture, an asynchronous model of request handling and multi-agent-based processing. TaKIPI-WS consists of three layers: WS Interface, Database and Daemons. The role of the Database is to store and exchange data between the Interface and the Daemons. The Daemons (i.e. taggers) are responsible for executing the requests queued in the database. Results of the performance tests are presented in the paper, too.

2008

pdf bib
Corpus-based Semantic Relatedness for the Construction of Polish WordNet
Bartosz Broda | Magdalena Derwojedowa | Maciej Piasecki | Stanislaw Szpakowicz
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The construction of a wordnet, a labour-intensive enterprise, can be significantly assisted by automatic grouping of lexical material and discovery of lexical semantic relations. The objective is to ensure high quality of automatically acquired results before they are presented for lexicographers’ approval. We discuss a software tool that suggests synset members using a measure of semantic relatedness with a given verb or adjective; this extends previous work on nominal synsets in Polish WordNet. Syntactically-motivated constraints are deployed on a large morphologically annotated corpus of Polish. Evaluation has been performed via the WordNet-Based Similarity Test and additionally supported by human raters. A lexicographer also manually assessed a suitable sample of suggestions. The results compare favourably with other known methods of acquiring semantic relations.