Elżbieta Hajnicz


2020

pdf bib
Interannotator Agreement for Lexico-Semantic Annotation of a Corpus
Elżbieta Hajnicz
Proceedings of the 12th Language Resources and Evaluation Conference

This paper examines the procedure for lexico-semantic annotation of the Basic Corpus of Polish Metaphors that is the first step for annotating metaphoric expressions occurring in it. The procedure involves correcting the morphosyntactic annotation of part of the corpus that is automatically annotated on the morphosyntactic level. The main procedure concerns annotation of adjectives, adverbs, nouns and verbs (including gerunds and participles), including abbreviations of the words that belong to the above classes. It is composed of three steps: deciding whether a particular occurrence of a word is asemantic (e.g. anaphoric or strictly grammatical), whether we are dealing with a multi-word expression, reciprocal usages of the się marker and pluralia tantum, which may involve annotation with two lexical units (having two different lemmas) for a single token. We propose an interannotator agreement statistics adequate for this procedure. Finally, we discuss the preliminary results of annotation of a fragment of the corpus.

2018

pdf bib
A New Version of the Składnica Treebank of Polish Harmonised with the Walenty Valency Dictionary
Marcin Woliński | Elżbieta Hajnicz | Tomasz Bartosiak
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Accessing and Elaborating Walenty - a Valence Dictionary of Polish - via Internet Browser
Bartłomiej Nitoń | Tomasz Bartosiak | Elżbieta Hajnicz
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article presents Walenty - a new valence dictionary of Polish predicates, concentrating on its creation process and access via Internet browser. The dictionary contains two layers, syntactic and semantic. The syntactic layer describes syntactic and morphosyntactic constraints predicates put on their dependants. The semantic layer shows how predicates and their arguments are involved in a situation described in an utterance. These two layers are connected, representing how semantic arguments can be realised on the surface. Walenty also contains a powerful phraseological (idiomatic) component. Walenty has been created and can be accessed remotely with a dedicated tool called Slowal. In this article, we focus on most important functionalities of this system. First, we will depict how to access the dictionary and how built-in filtering system (covering both syntactic and semantic phenomena) works. Later, we will describe the process of creating dictionary by Slowal tool that both supports and controls the work of lexicographers.

pdf bib
Semantic Layer of the Valence Dictionary of Polish Walenty
Elżbieta Hajnicz | Anna Andrzejczuk | Tomasz Bartosiak
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article presents the semantic layer of Walenty―a new valence dictionary of Polish predicates, with a number of novel features, as compared to other such dictionaries. The dictionary contains two layers, syntactic and semantic. The syntactic layer describes syntactic and morphosyntactic constraints predicates put on their dependants. In particular, it includes a comprehensive and powerful phraseological component. The semantic layer shows how predicates and their arguments are involved in a described situation in an utterance. These two layers are connected, representing how semantic arguments can be realised on the surface. Each syntactic schema and each semantic frame are illustrated by at least one exemplary sentence attested in linguistic reality. The semantic layer consists of semantic frames represented as lists of pairs and connected with PlWordNet lexical units. Semantic roles have a two-level representation (basic roles are provided with an attribute) enabling representation of arguments in a flexible way. Selectional preferences are based on PlWordNet structure as well.

2014

pdf bib
Lexico-Semantic Annotation of Składnica Treebank by means of PLWN Lexical Units
Elżbieta Hajnicz
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Extended phraseological information in a valence dictionary for NLP applications
Adam Przepiórkowski | Elżbieta Hajnicz | Agnieszka Patejuk | Marcin Woliński
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

pdf bib
Walenty: Towards a comprehensive valence dictionary of Polish
Adam Przepiórkowski | Elżbieta Hajnicz | Agnieszka Patejuk | Marcin Woliński | Filip Skwarski | Marek Świdziński
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents Walenty, a comprehensive valence dictionary of Polish, with a number of novel features, as compared to other such dictionaries. The notion of argument is based on the coordination test and takes into consideration the possibility of diverse morphosyntactic realisations. Some aspects of the internal structure of phraseological (idiomatic) arguments are handled explicitly. While the current version of the dictionary concentrates on syntax, it already contains some semantic features, including semantically defined arguments, such as locative, temporal or manner, as well as control and raising, and work on extending it with semantic roles and selectional preferences is in progress. Although Walenty is still being intensively developed, it is already by far the largest Polish valence dictionary, with around 8600 verbal lemmata and almost 39 000 valence schemata. The dictionary is publicly available on the Creative Commons BY SA licence and may be downloaded from http://zil.ipipan.waw.pl/Walenty.

pdf bib
The Procedure of Lexico-Semantic Annotation of Składnica Treebank
Elżbieta Hajnicz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, the procedure of lexico-semantic annotation of Składnica Treebank using Polish WordNet is presented. Other semantically annotated corpora, in particular treebanks, are outlined first. Resources involved in annotation as well as a tool called Semantikon used for it are described. The main part of the paper is the analysis of the applied procedure. It consists of the basic and correction phases. During basic phase all nouns, verbs and adjectives are annotated with wordnet senses. The annotation is performed independently by two linguists. During the correction phase, conflicts are resolved by the linguist supervising the process. Multi-word units obtain special tags, synonyms and hypernyms are used for senses absent in Polish WordNet. Additionally, each sentence receives its general assessment. Finally, some statistics of the results of annotation are given, including inter-annotator agreement. The final resource is represented in XML files preserving the structure of Składnica.