Manex Agirrezabal


2020

pdf bib
Automatic Detection and Classification of Head Movements in Face-to-Face Conversations
Patrizia Paggio | Manex Agirrezabal | Bart Jongejan | Costanza Navarretta
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-to-face conversations in Danish involving 12 different speakers. A number of classifiers were trained with different combinations of visual, acoustic and word features and tested in a leave-one-out cross validation scenario. The visual movement features were extracted from the raw video data using OpenPose, and the acoustic ones using Praat. The best results were obtained by a Multilayer Perceptron classifier, which reached an average 0.68 F1 score across the 12 speakers for head movement detection, and 0.40 for head movement classification given four different classes. In both cases, the classifier outperformed a simple most frequent class baseline as well as a more advanced baseline only relying on velocity features.

pdf bib
KU-CST at the SIGMORPHON 2020 Task 2 on Unsupervised Morphological Paradigm Completion
Manex Agirrezabal | Jürgen Wedekind
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We present a model for the unsupervised dis- covery of morphological paradigms. The goal of this model is to induce morphological paradigms from the bible (raw text) and a list of lemmas. We have created a model that splits each lemma in a stem and a suffix, and then we try to create a plausible suffix list by con- sidering lemma pairs. Our model was not able to outperform the official baseline, and there is still room for improvement, but we believe that the ideas presented here are worth considering.

2019

pdf bib
Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?
Sidsel Boldsen | Manex Agirrezabal | Patrizia Paggio
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

pdf bib
The Seemingly (Un)systematic Linking Element in Danish
Sidsel Boldsen | Manex Agirrezabal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

The use of a linking element between compound members is a common phenomenon in Germanic languages. Still, the exact use and conditioning of such elements is a disputed topic in linguistics. In this paper we address the issue of predicting the use of linking elements in Danish. Following previous research that shows how the choice of linking element might be conditioned by phonology, we frame the problem as a language modeling task: Considering the linking elements -s/-∅ the problem becomes predicting what is most probable to encounter next, a syllable boundary or the joining element, ‘s’. We show that training a language model on this task reaches an accuracy of 94 %, and in the case of an unsupervised model, the accuracy reaches 80%.

2018

pdf bib
KU-CST at CoNLLSIGMORPHON 2018 Shared Task: a Tridirectional Model
Manex Agirrezabal
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

2017

pdf bib
A Comparison of Feature-Based and Neural Scansion of Poetry
Manex Agirrezabal | Iñaki Alegria | Mans Hulden
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Automatic analysis of poetic rhythm is a challenging task that involves linguistics, literature, and computer science. When the language to be analyzed is known, rule-based systems or data-driven methods can be used. In this paper, we analyze poetic rhythm in English and Spanish. We show that the representations of data learned from character-based neural models are more informative than the ones from hand-crafted features, and that a Bi-LSTM+CRF-model produces state-of-the art accuracy on scansion of poetry in two languages. Results also show that the information about whole word structure, and not just independent syllables, is highly informative for performing scansion.

2016

pdf bib
Machine Learning for Metrical Analysis of English Poetry
Manex Agirrezabal | Iñaki Alegria | Mans Hulden
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this work we tackle the challenge of identifying rhythmic patterns in poetry written in English. Although poetry is a literary form that makes use standard meters usually repeated among different authors, we will see in this paper how performing such analyses is a difficult task in machine learning due to the unexpected deviations from such standard patterns. After breaking down some examples of classical poetry, we apply a number of NLP techniques for the scansion of poetry, training and testing our systems against a human-annotated corpus. With these experiments, our purpose is establish a baseline of automatic scansion of poetry using NLP tools in a straightforward manner and to raise awareness of the difficulties of this task.

pdf bib
Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties
Pablo Gamallo | Iñaki Alegria | José Ramom Pichel | Manex Agirrezabal
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This article describes the systems submitted by the Citius_Ixa_Imaxin team to the Discriminating Similar Languages Shared Task 2016. The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers. The results of the evaluation show that ranking dictionaries are more sound and stable across different domains while basic bayesian models perform reasonably well on in-domain datasets, but their performance drops when they are applied on out-of-domain texts.

2013

pdf bib
ZeuScansion: a tool for scansion of English poetry
Manex Agirrezabal | Bertol Arrieta | Aitzol Astigarraga | Mans Hulden
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing

pdf bib
A Finite-State Approach to Translate SNOMED CT Terms into Basque Using Medical Prefixes and Suffixes
Olatz Perez-de-Viñaspre | Maite Oronoz | Manex Agirrezabal | Mikel Lersundi
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing

pdf bib
POS-Tag Based Poetry Generation with WordNet
Manex Agirrezabal | Bertol Arrieta | Aitzol Astigarraga | Mans Hulden
Proceedings of the 14th European Workshop on Natural Language Generation

pdf bib
Towards Basque Oral Poetry Analysis: A Machine Learning Approach
Mikel Osinalde | Aitzol Astigarraga | Igor Rodriguez | Manex Agirrezabal
Proceedings of the Student Research Workshop associated with RANLP 2013

2012

pdf bib
BAD: An Assistant tool for making verses in Basque
Manex Agirrezabal | Iñaki Alegria | Bertol Arrieta | Mans Hulden
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Finite-State Technology in a Verse-Making Tool
Manex Agirrezabal | Iñaki Alegria | Bertol Arrieta | Mans Hulden
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing