Borbála Novák


2020

pdf bib
CBOW-tag: a Modified CBOW Algorithm for Generating Embedding Models from Annotated Corpora
Attila Novák | László Laki | Borbála Novák
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we present a modified version of the CBOW algorithm implemented in the fastText framework. Our modified algorithm, CBOW-tag builds a vector space model that includes the representation of the original word forms and their annotation at the same time. We illustrate the results by presenting a model built from a corpus that includes morphological and syntactic annotations. The simultaneous presence of unannotated elements and different annotations at the same time in the model makes it possible to constrain nearest neighbour queries to specific types of elements. The model can thus efficiently answer questions such as What do we eat?, What can we do with a skeleton? What else do we do with what we eat?, etc. Error analysis reveals that the model can highlight errors introduced into the annotation by the tagger and parser we used to generate the annotations as well as lexical peculiarities in the corpus itself, especially if we do not limit the vocabulary of the model to frequent items.

2019

pdf bib
Creation of a corpus with semantic role labels for Hungarian
Attila Novák | László Laki | Borbála Novák | Andrea Dömötör | Noémi Ligeti-Nagy | Ágnes Kalivoda
Proceedings of the 13th Linguistic Annotation Workshop

In this article, an ongoing research is presented, the immediate goal of which is to create a corpus annotated with semantic role labels for Hungarian that can be used to train a parser-based system capable of formulating relevant questions about the text it processes. We briefly describe the objectives of our research, our efforts at eliminating errors in the Hungarian Universal Dependencies corpus, which we use as the base of our annotation effort, at creating a Hungarian verbal argument database annotated with thematic roles, at classifying adjuncts, and at matching verbal argument frames to specific occurrences of verbs and participles in the corpus.

2018

pdf bib
Cross-Lingual Generation and Evaluation of a Wide-Coverage Lexical Semantic Resource
Attila Novák | Borbála Novák
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)