Kairit Sirts


2018

pdf bib
Modeling Composite Labels for Neural Morphological Tagging
Alexander Tkachenko | Kairit Sirts
Proceedings of the 22nd Conference on Computational Natural Language Learning

Neural morphological tagging has been regarded as an extension to POS tagging task, treating each morphological tag as a monolithic label and ignoring its internal structure. We propose to view morphological tags as composite labels and explicitly model their internal structure in a neural sequence tagger. For this, we explore three different neural architectures and compare their performance with both CRF and simple neural multiclass baselines. We evaluate our models on 49 languages and show that the neural architecture that models the morphological labels as sequences of morphological category values performs significantly better than both baselines establishing state-of-the-art results in morphological tagging for most languages.

2017

pdf bib
Idea density for predicting Alzheimer’s disease from transcribed speech
Kairit Sirts | Olivier Piguet | Mark Johnson
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Idea Density (ID) measures the rate at which ideas or elementary predications are expressed in an utterance or in a text. Lower ID is found to be associated with an increased risk of developing Alzheimer’s disease (AD) (Snowdon et al., 1996; Engelman et al., 2010). ID has been used in two different versions: propositional idea density (PID) counts the expressed ideas and can be applied to any text while semantic idea density (SID) counts pre-defined information content units and is naturally more applicable to normative domains, such as picture description tasks. In this paper, we develop DEPID, a novel dependency-based method for computing PID, and its version DEPID-R that enables to exclude repeating ideas—a feature characteristic to AD speech. We conduct the first comparison of automatically extracted PID and SID in the diagnostic classification task on two different AD datasets covering both closed-topic and free-recall domains. While SID performs better on the normative dataset, adding PID leads to a small but significant improvement (+1.7 F-score). On the free-topic dataset, PID performs better than SID as expected (77.6 vs 72.3 in F-score) but adding the features derived from the word embedding clustering underlying the automatic SID increases the results considerably, leading to an F-score of 84.8.

pdf bib
Linear Ensembles of Word Embedding Models
Avo Muromägi | Kairit Sirts | Sven Laur
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
STransE: a novel embedding model of entities and relationships in knowledge bases
Dat Quoc Nguyen | Kairit Sirts | Lizhen Qu | Mark Johnson
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Neighborhood Mixture Model for Knowledge Base Completion
Dat Quoc Nguyen | Kairit Sirts | Lizhen Qu | Mark Johnson
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
A Comparative Study of Minimally Supervised Morphological Segmentation
Teemu Ruokolainen | Oskar Kohonen | Kairit Sirts | Stig-Arne Grönroos | Mikko Kurimo | Sami Virpioja
Computational Linguistics, Volume 42, Issue 1 - March 2016

2015

pdf bib
Query-Based Single Document Summarization Using an Ensemble Noisy Auto-Encoder
Mahmood Yousefi Azar | Kairit Sirts | Diego Mollá Aliod | Len Hamey
Proceedings of the Australasian Language Technology Association Workshop 2015

pdf bib
Do POS Tags Help to Learn Better Morphological Segmentations?
Kairit Sirts | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2015

pdf bib
Improving Topic Coherence with Latent Feature Word Representations in MAP Estimation for Topic Modeling
Dat Quoc Nguyen | Kairit Sirts | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2015

2014

pdf bib
POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process
Kairit Sirts | Jacob Eisenstein | Micha Elsner | Sharon Goldwater
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Minimally-Supervised Morphological Segmentation using Adaptor Grammars
Kairit Sirts | Sharon Goldwater
Transactions of the Association for Computational Linguistics, Volume 1

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.

2012

pdf bib
A Hierarchical Dirichlet Process Model for Joint Part-of-Speech and Morphology Induction
Kairit Sirts | Tanel Alumäe
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies