Arturs Znotins

Also published as: Artūrs Znotiņš


2018

pdf bib
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
Normunds Gruzitis | Lauma Pretkalnina | Baiba Saulite | Laura Rituma | Gunta Nespore-Berzkalne | Arturs Znotins | Peteris Paikens
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Multilingual Clustering of Streaming News
Sebastião Miranda | Artūrs Znotiņš | Shay B. Cohen | Guntis Barzdins
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual clusters. Unlike typical clustering approaches that report results on datasets with a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. In our formulation, the monolingual clusters group together documents while the crosslingual clusters group together monolingual clusters, one per language that appears in the stream. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish.

2014

pdf bib
Coreference Resolution for Latvian
Artūrs Znotiņš | Pēteris Paikens
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Coreference resolution (CR) is a current problem in natural language processing (NLP) research and it is a key task in applications such as question answering, text summarization and information extraction for which text understanding is of crucial importance. We describe an implementation of coreference resolution tools for Latvian language, developed as a part of a tool chain for newswire text analysis but usable also as a separate, publicly available module. LVCoref is a rule based CR system that uses entity centric model that encourages the sharing of information across all mentions that point to the same real-world entity. The system is developed to provide starting ground for further experiments and generate a reference baseline to be compared with more advanced rule-based and machine learning based future coreference resolvers. It now reaches 66.6 F-score using predicted mentions and 78.1% F-score using gold mentions. This paper describes current efforts to create a CR system and to improve NER performance for Latvian. Task also includes creation of the corpus of manually annotated coreference relations.

pdf bib
Dependency parsing representation effects on the accuracy of semantic applications — an example of an inflective language
Lauma Pretkalniņa | Artūrs Znotiņš | Laura Rituma | Didzis Goško
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we investigate how different dependency representations of a treebank influence the accuracy of the dependency parser trained on this treebank and the impact on several parser applications: named entity recognition, coreference resolution and limited semantic role labeling. For these experiments we use Latvian Treebank, whose native annotation format is dependency based hybrid augmented with phrase-like elements. We explore different representations of coordinations, complex predicates and punctuation mark attachment. Our experiments shows that parsers trained on the variously transformed treebanks vary significantly in their accuracy, but the best-performing parser as measured by attachment score not always leads to best accuracy for an end application.