Barbara Plank

Also published as: B. Plank


2020

pdf bib
Buhscitu at SemEval-2020 Task 7: Assessing Humour in Edited News Headlines Using Hand-Crafted Features and Online Knowledge Bases
Kristian Nørgaard Jensen | Nicolaj Filrup Rasmussen | Thai Wang | Marco Placenti | Barbara Plank
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes a system that aims at assessing humour intensity in edited news headlines as part of the 7th task of SemEval-2020 on “Humor, Emphasis and Sentiment”. Various factors need to be accounted for in order to assess the funniness of an edited headline. We propose an architecture that uses hand-crafted features, knowledge bases and a language model to understand humour, and combines them in a regression model. Our system outperforms two baselines. In general, automatic humour assessment remains a difficult task.

pdf bib
Team DiSaster at SemEval-2020 Task 11: Combining BERT and Hand-crafted Features for Identifying Propaganda Techniques in News
Anders Kaas | Viktor Torp Thomsen | Barbara Plank
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The identification of communication techniques in news articles such as propaganda is important, as such techniques can influence the opinions of large numbers of people. Most work so far focused on the identification at the news article level. Recently, a new dataset and shared task has been proposed for the identification of propaganda techniques at the finer-grained span level. This paper describes our system submission to the subtask of technique classification (TC) for the SemEval 2020 shared task on detection of propaganda techniques in news articles. We propose a method of combining neural BERT representations with hand-crafted features via stacked generalization. Our model has the added advantage that it combines the power of contextual representations from BERT with simple span-based and article-based global features. We present an ablation study which shows that even though BERT representations are very powerful also for this task, BERT still benefits from being combined with carefully designed task-specific features.

pdf bib
Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction
Alan Ramponi | Barbara Plank | Rosario Lombardo
Proceedings of the 12th Language Resources and Evaluation Conference

Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985.

pdf bib
NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets
Anders Giovanni Møller | Rob van der Goot | Barbara Plank
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important. This paper describes our submission to the WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. We investigate the effectiveness for a variety of classification models, and found that domain-specific pre-trained BERT models lead to the best performance. On top of this, we attempt a variety of ensembling strategies, but these attempts did not lead to further improvements. Our final best model, the standalone CT-BERT model, proved to be highly competitive, leading to a shared first place in the shared task. Our results emphasize the importance of domain and task-related pre-training.

pdf bib
DaN+: Danish Nested Named Entities and Lexical Normalization
Barbara Plank | Kristian Nørgaard Jensen | Rob van der Goot
Proceedings of the 28th International Conference on Computational Linguistics

This paper introduces DAN+, a new multi-domain corpus and annotation guidelines for Dan-ish nested named entities (NEs) and lexical normalization to support research on cross-lingualcross-domain learning for a less-resourced language. We empirically assess three strategies tomodel the two-layer Named Entity Recognition (NER) task. We compare transfer capabilitiesfrom German versus in-language annotation from scratch. We examine language-specific versusmultilingual BERT, and study the effect of lexical normalization on NER. Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexicalnormalization are the most beneficial on the least canonical data. Our results also show that anout-of-domain setup remains challenging, while performance on news plateaus quickly. Thishighlights the importance of cross-domain evaluation of cross-lingual transfer.

pdf bib
Neural Unsupervised Domain Adaptation in NLPA Survey
Alan Ramponi | Barbara Plank
Proceedings of the 28th International Conference on Computational Linguistics

Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.

pdf bib
Biomedical Event Extraction as Sequence Labeling
Alan Ramponi | Rob van der Goot | Rosario Lombardo | Barbara Plank
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model. BeeSL recasts the task as sequence labeling, taking advantage of a multi-label aware encoding strategy and jointly modeling the intermediate tasks via multi-task learning. BeeSL is fast, accurate, end-to-end, and unlike current methods does not require any external knowledge base or preprocessing tools. BeeSL outperforms the current best system (Li et al., 2019) on the Genia 2011 benchmark by 1.57% absolute F1 score reaching 60.22% F1, establishing a new state of the art for the task. Importantly, we also provide first results on biomedical event extraction without gold entity information. Empirical results show that BeeSL’s speed and accuracy makes it a viable approach for large-scale real-world scenarios.

pdf bib
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media
Malvina Nissim | Viviana Patti | Barbara Plank | Esin Durmus
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

2019

pdf bib
At a Glance: The Impact of Gaze Aggregation Views on Syntactic Tagging
Sigrid Klerke | Barbara Plank
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

Readers’ eye movements used as part of the training signal have been shown to improve performance in a wide range of Natural Language Processing (NLP) tasks. Previous work uses gaze data either at the type level or at the token level and mostly from a single eye-tracking corpus. In this paper, we analyze type vs token-level integration options with eye tracking data from two corpora to inform two syntactic sequence labeling problems: binary phrase chunking and part-of-speech tagging. We show that using globally-aggregated measures that capture the central tendency or variability of gaze data is more beneficial than proposed local views which retain individual participant information. While gaze data is informative for supervised POS tagging, which complements previous findings on unsupervised POS induction, almost no improvement is obtained for binary phrase chunking, except for a single specific setup. Hence, caution is warranted when using gaze data as signal for NLP, as no single view is robust over tasks, modeling choice and gaze corpus.

pdf bib
Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering
Claudio Greco | Barbara Plank | Raquel Fernández | Raffaella Bernardi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA). Motivated by evidence from psycholinguistics, we devise a set of linguistically-informed VQA tasks, which differ by the types of questions involved (Wh-questions and polar questions). We test what impact task difficulty has on continual learning, and whether the order in which a child acquires question types facilitates computational models. Our results show that dramatic forgetting is at play and that task difficulty and order matter. Two well-known current continual learning methods mitigate the problem only to a limiting degree.

pdf bib
MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding
Nils Rethmeier | Barbara Plank
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Word embeddings have undoubtedly revolutionized NLP. However, pretrained embeddings do not always work for a specific task (or set of tasks), particularly in limited resource setups. We introduce a simple yet effective, self-supervised post-processing method that constructs task-specialized word representations by picking from a menu of reconstructing transformations to yield improved end-task performance (MORTY). The method is complementary to recent state-of-the-art approaches to inductive transfer via fine-tuning, and forgoes costly model architectures and annotation. We evaluate MORTY on a broad range of setups, including different word embedding methods, corpus sizes and end-task semantics. Finally, we provide a surprisingly simple recipe to obtain specialized embeddings that better fit end-tasks.

pdf bib
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Mareike Hartmann | Barbara Plank
Proceedings of the 22nd Nordic Conference on Computational Linguistics

pdf bib
Lexical Resources for Low-Resource PoS Tagging in Neural Times
Barbara Plank | Sigrid Klerke
Proceedings of the 22nd Nordic Conference on Computational Linguistics

More and more evidence is appearing that integrating symbolic lexical knowledge into neural models aids learning. This contrasts the widely-held belief that neural networks largely learn their own feature representations. For example, recent work has shows benefits of integrating lexicons to aid cross-lingual part-of-speech (PoS). However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing a thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times.

pdf bib
The Lacunae of Danish Natural Language Processing
Andreas Kirkedal | Barbara Plank | Leon Derczynski | Natalie Schluter
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation. However, the language has received relatively little attention from a technological perspective. In this paper, we review Natural Language Processing (NLP) research, digital resources and tools which have been developed for Danish. We find that availability of models and tools is limited, which calls for work that lifts Danish NLP a step closer to the privileged languages. Dansk abstrakt: Dansk er et nordgermansk sprog, talt primært i kongeriget Danmark, et land med stærk tradition for teknologisk og videnskabelig innovation. Det danske sprog har imidlertid været genstand for relativt begrænset opmærksomhed, teknologisk set. I denne artikel gennemgår vi sprogteknologi-forskning, -ressourcer og -værktøjer udviklet for dansk. Vi konkluderer at der eksisterer et fåtal af modeller og værktøjer, hvilket indbyder til forskning som løfter dansk sprogteknologi i niveau med mere priviligerede sprog.

pdf bib
Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish
Barbara Plank
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures. However, the success of these methods depends on large amounts of training data. The scarcity of publicly-available human-labeled datasets has resulted in limited evaluation of existing NER systems, as is the case for Danish. This paper studies the effectiveness of cross-lingual transfer for Danish, evaluates its complementarity to limited gold data, and sheds light on performance of Danish NER.

pdf bib
SyntaxFest 2019 Invited talk - Transferring NLP models across languages and domains
Barbara Plank
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat
Ravi Shekhar | Aashish Venkatesh | Tim Baumgärtner | Elia Bruni | Barbara Plank | Raffaella Bernardi | Raquel Fernández
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components. As a test-bed, we focus on the GuessWhat?! game, a two-player game where the goal is to identify an object in a complex visual scene by asking a sequence of yes/no questions. Our visually-grounded encoder leverages synergies between guessing and asking questions, as it is trained jointly using multi-task learning. We further enrich our model via a cooperative learning regime. We show that the introduction of both the joint architecture and cooperative learning lead to accuracy improvements over the baseline system. We compare our approach to an alternative system which extends the baseline with reinforcement learning. Our in-depth analysis shows that the linguistic skills of the two models differ dramatically, despite approaching comparable performance levels. This points at the importance of analyzing the linguistic output of competing systems beyond numeric comparison solely based on task success.

2018

pdf bib
Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
Barbara Plank | Željko Agić
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data.

pdf bib
Strong Baselines for Neural Semi-Supervised Learning under Domain Shift
Sebastian Ruder | Barbara Plank
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Novel neural models have been proposed in recent years for learning under domain shift. Most models, however, only evaluate on a single task, on proprietary datasets, or compare to weak baselines, which makes comparison of models difficult. In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training. Extensive experiments on two benchmarks for part-of-speech tagging and sentiment analysis are negative: while our novel method establishes a new state-of-the-art for sentiment analysis, it does not fare consistently the best. More importantly, we arrive at the somewhat surprising conclusion that classic tri-training, with some additions, outperforms the state-of-the-art for NLP. Hence classic approaches constitute an important and strong baseline.

pdf bib
Bleaching Text: Abstract Features for Cross-lingual Gender Prediction
Rob van der Goot | Nikola Ljubešić | Ian Matroos | Malvina Nissim | Barbara Plank
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.

pdf bib
Grotoco@SLAM: Second Language Acquisition Modeling with Simple Features, Learners and Task-wise Models
Sigrid Klerke | Héctor Martínez Alonso | Barbara Plank
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present our submission to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM). We focus on evaluating a range of features for the task, including user-derived measures, while examining how far we can get with a simple linear classifier. Our analysis reveals that errors differ per exercise format, which motivates our final and best-performing system: a task-wise (per exercise-format) model.

pdf bib
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media
Malvina Nissim | Viviana Patti | Barbara Plank | Claudia Wagner
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

pdf bib
Predicting Authorship and Author Traits from Keystroke Dynamics
Barbara Plank
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

Written text transmits a good deal of nonverbal information related to the author’s identity and social factors, such as age, gender and personality. However, it is less known to what extent behavioral biometric traces transmit such information. We use typist data to study the predictiveness of authorship, and present first experiments on predicting both age and gender from keystroke dynamics. Our results show that the model based on keystroke features, while being two orders of magnitude smaller, leads to significantly higher accuracies for authorship than the text-based system. For user attribute prediction, the best approach is to combine the two, suggesting that extralinguistic factors are disclosed to a larger degree in written text, while author identity is better transmitted in typing behavior.

pdf bib
Character-level Supervision for Low-resource POS Tagging
Katharina Kann | Johannes Bjerva | Isabelle Augenstein | Barbara Plank | Anders Søgaard
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, we present an architecture for learning more robust neural POS taggers by jointly training a hierarchical, recurrent model and a recurrent character-based sequence-to-sequence network supervised using an auxiliary objective. This way, we introduce stronger character-level supervision into the model, which enables better generalization to unseen words and provides regularization, making our encoding less prone to overfitting. We experiment with three auxiliary tasks: lemmatization, character-based word autoencoding, and character-based random string autoencoding. Experiments with minimal amounts of labeled data on 34 languages show that our new architecture outperforms a single-task baseline and, surprisingly, that, on average, raw text autoencoding can be as beneficial for low-resource POS tagging as using lemma information. Our neural POS tagger closes the gap to a state-of-the-art POS tagger (MarMoT) for low-resource scenarios by 43%, even outperforming it on languages with templatic morphology, e.g., Arabic, Hebrew, and Turkish, by some margin.

pdf bib
When Simple n-gram Models Outperform Syntactic Approaches: Discriminating between Dutch and Flemish
Martin Kroon | Masha Medvedeva | Barbara Plank
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

In this paper we present the results of our participation in the Discriminating between Dutch and Flemish in Subtitles VarDial 2018 shared task. We try techniques proven to work well for discriminating between language varieties as well as explore the potential of using syntactic features, i.e. hierarchical syntactic subtrees. We experiment with different combinations of features. Discriminating between these two languages turned out to be a very hard task, not only for a machine: human performance is only around 0.51 F1 score; our best system is still a simple Naive Bayes model with word unigrams and bigrams. The system achieved an F1 score (macro) of 0.62, which ranked us 4th in the shared task.

2017

pdf bib
When is multitask learning effective? Semantic sequence prediction under varying data conditions
Héctor Martínez Alonso | Barbara Plank
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic. However, little is known on when MTL works and whether there are data characteristics that help to determine the success of MTL. In this paper we evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine different auxiliary task configurations, amongst which a novel setup, and correlate their impact to data-dependent conditions. Our results show that MTL is not always effective, because significant improvements are obtained only for 1 out of 5 tasks. When successful, auxiliary tasks with compact and more uniform label distributions are preferable.

pdf bib
Parsing Universal Dependencies without training
Héctor Martínez Alonso | Željko Agić | Barbara Plank | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We present UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of specific dependency head rules. UDP features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative to cross-lingual parsing for UD. The parser has very few parameters and distinctly robust to domain change across languages.

pdf bib
Cross-lingual tagger evaluation without test data
Željko Agić | Barbara Plank | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data. We put forth and evaluate two dictionary-based metrics. On the tasks of accuracy prediction and system ranking, we reveal that these metrics are reliable enough to approximate test set-based evaluation, and at the same time lean enough to support assessment for truly low-resource languages.

pdf bib
All-In-1 at IJCNLP-2017 Task 4: Short Text Classification with One Model for All Languages
Barbara Plank
Proceedings of the IJCNLP 2017, Shared Tasks

We present All-In-1, a simple model for multilingual text classification that does not require any parallel data. It is based on a traditional Support Vector Machine classifier exploiting multilingual word embeddings and character n-grams. Our model is simple, easily extendable yet very effective, overall ranking 1st (out of 12 teams) in the IJCNLP 2017 shared task on customer feedback analysis in four languages: English, French, Japanese and Spanish.

pdf bib
Last Words: Sharing Is Caring: The Future of Shared Tasks
Malvina Nissim | Lasha Abzianidze | Kilian Evang | Rob van der Goot | Hessel Haagsma | Barbara Plank | Martijn Wieling
Computational Linguistics, Volume 43, Issue 4 - December 2017

pdf bib
When Sparse Traditional Models Outperform Dense Neural Networks: the Curious Case of Discriminating between Similar Languages
Maria Medvedeva | Martin Kroon | Barbara Plank
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)

We present the results of our participation in the VarDial 4 shared task on discriminating closely related languages. Our submission includes simple traditional models using linear support vector machines (SVMs) and a neural network (NN). The main idea was to leverage language group information. We did so with a two-layer approach in the traditional model and a multi-task objective in the neural network case. Our results confirm earlier findings: simple traditional models outperform neural networks consistently for this task, at least given the amount of systems we could examine in the available time. Our two-layer linear SVM ranked 2nd in the shared task.

pdf bib
To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging
Rob van der Goot | Barbara Plank | Malvina Nissim
Proceedings of the 3rd Workshop on Noisy User-generated Text

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data? To the best of our knowledge, little is known on the actual impact of normalization in a real-world scenario, where gold error detection is not available. We investigate the effect of automatic normalization on POS tagging of tweets. We also compare normalization to strategies that leverage large amounts of unlabeled data kept in its raw form. Our results show that normalization helps, but does not add consistently beyond just word embedding layer initialization. The latter approach yields a tagging model that is competitive with a Twitter state-of-the-art tagger.

pdf bib
Neural Networks and Spelling Features for Native Language Identification
Johannes Bjerva | Gintarė Grigonytė | Robert Östling | Barbara Plank
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We present the RUG-SU team’s submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.

pdf bib
The Power of Character N-grams in Native Language Identification
Artur Kulmizev | Bo Blankers | Johannes Bjerva | Malvina Nissim | Gertjan van Noord | Barbara Plank | Martijn Wieling
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017. Our basic system (GRONINGEN) achieves the best performance (87.56 F1-score) on the evaluation set using only 1-9 character n-grams as features. We compare this against several ensemble and meta-classifiers in order to examine how the linear system fares when combined with other, especially non-linear classifiers. Special emphasis is placed on the topic bias that exists by virtue of the assessment essay prompt distribution.

pdf bib
Learning to select data for transfer learning with Bayesian Optimization
Sebastian Ruder | Barbara Plank
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we propose to learn data selection measures using Bayesian Optimization and evaluate them across models, domains and tasks. Our learned measures outperform existing domain similarity measures significantly on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We show the importance of complementing similarity with diversity, and that learned measures are–to some degree–transferable across models, domains, and even tasks.

2016

pdf bib
TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
Ben Verhoeven | Walter Daelemans | Barbara Plank
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Personality profiling is the task of detecting personality traits of authors based on writing style. Several personality typologies exist, however, the Briggs-Myer Type Indicator (MBTI) is particularly popular in the non-scientific community, and many people use it to analyse their own personality and talk about the results online. Therefore, large amounts of self-assessed data on MBTI are readily available on social-media platforms such as Twitter. We present a novel corpus of tweets annotated with the MBTI personality type and gender of their author for six Western European languages (Dutch, German, French, Italian, Portuguese and Spanish). We outline the corpus creation and annotation, show statistics of the obtained data distributions and present first baselines on Myers-Briggs personality profiling and gender prediction for all six languages.

pdf bib
Keystroke dynamics as signal for shallow syntactic parsing
Barbara Plank
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing. But do keystroke logs contain actual signal that can be used to learn better natural language processing models? We postulate that keystroke dynamics contain information about syntactic structure that can inform shallow syntactic parsing. To test this hypothesis, we explore labels derived from keystroke logs as auxiliary task in a multi-task bidirectional Long Short-Term Memory (bi-LSTM). Our results show promising results on two shallow syntactic parsing tasks, chunking and CCG supertagging. Our model is simple, has the advantage that data can come from distinct sources, and produces models that are significantly better than models trained on the text annotations alone.

pdf bib
Multi-view and multi-task training of RST discourse parsers
Chloé Braud | Barbara Plank | Anders Søgaard
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We experiment with different ways of training LSTM networks to predict RST discourse trees. The main challenge for RST discourse parsing is the limited amounts of training data. We combat this by regularizing our models using task supervision from related tasks as well as alternative views on discourse structures. We show that a simple LSTM sequential discourse parser takes advantage of this multi-view and multi-task framework with 12-15% error reductions over our baseline (depending on the metric) and results that rival more complex state-of-the-art parsers.

pdf bib
Semantic Tagging with Deep Residual Networks
Johannes Bjerva | Barbara Plank | Johan Bos
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a novel semantic tagging task, semtagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets). Our tagger uses both word and character representations, and includes a novel residual bypass architecture. We evaluate the tagset both intrinsically on the new task of semantic tagging, as well as on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an auxiliary loss function predicting our semantic tags, significantly outperforms prior results on English Universal Dependencies POS tagging (95.71% accuracy on UD v1.2 and 95.67% accuracy on UD v1.3).

pdf bib
Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
Barbara Plank | Anders Søgaard | Yoav Goldberg
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
LiMoSINe Pipeline: Multilingual UIMA-based NLP Platform
Olga Uryupina | Barbara Plank | Gianni Barlacchi | Francisco J. Valverde Albacete | Manos Tsagkias | Antonio Uva | Alessandro Moschitti
Proceedings of ACL-2016 System Demonstrations

pdf bib
Supersense tagging with inter-annotator disagreement
Héctor Martínez Alonso | Anders Johannsen | Barbara Plank
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Processing non-canonical or noisy text: fortuitous data to the rescue
Barbara Plank
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

Real world data differs radically from the benchmark corpora we use in NLP, resulting in large performance drops. The reason for this problem is obvious: NLP models are trained on limited samples from canonical varieties considered standard. However, there are many dimensions, e.g., sociodemographic, language, genre, sentence type, etc. on which texts can differ from the standard. The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language. In this talk, I review the notion of canonicity, and how it shapes our community’s approach to language. I argue for the use of fortuitous data. Fortuitous data is data out there that just waits to be harvested. It includes data which is in plain sight, but is often neglected, and more distant sources like behavioral data, which first need to be refined. They provide additional contexts and a myriad of opportunities to build more adaptive language technology, some of which I will explore in this talk.

pdf bib
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)
Malvina Nissim | Viviana Patti | Barbara Plank
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

pdf bib
Multilingual Projection for Parsing Truly Low-Resource Languages
Željko Agić | Anders Johannsen | Barbara Plank | Héctor Martínez Alonso | Natalie Schluter | Anders Søgaard
Transactions of the Association for Computational Linguistics, Volume 4

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

2015

pdf bib
Do dependency parsing metrics correlate with human judgments?
Barbara Plank | Héctor Martínez Alonso | Željko Agić | Danijela Merkler | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Non-canonical language is not harder to annotate than canonical language
Barbara Plank | Héctor Martínez Alonso | Anders Søgaard
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
Active learning for sense annotation
Héctor Martínez Alonso | Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Personality Traits on Twitter—or—How to Get 1,500 Personality Tests in a Week
Barbara Plank | Dirk Hovy
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Mining for unambiguous instances to adapt part-of-speech taggers to new domains
Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Learning to parse with IAA-weighted loss
Héctor Martínez Alonso | Barbara Plank | Arne Skjærholt | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Semantic Representations for Domain Adaptation: A Case Study on the Tree Kernel-based Method for Relation Extraction
Thien Huu Nguyen | Barbara Plank | Ralph Grishman
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Inverted indexing for cross-lingual NLP
Anders Søgaard | Željko Agić | Héctor Martínez Alonso | Barbara Plank | Bernd Bohnet | Anders Johannsen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
CPH: Sentiment analysis of Figurative Language on Twitter #easypeasy #not
Sarah McGillion | Héctor Martínez Alonso | Barbara Plank
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
More or less supervised supersense tagging of Twitter
Anders Johannsen | Dirk Hovy | Héctor Martínez Alonso | Barbara Plank | Anders Søgaard
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

pdf bib
Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems
Natalie Schluter | Anders Søgaard | Jakob Elming | Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Johanssen | Sigrid Klerke
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Opinion Mining on YouTube
Aliaksei Severyn | Alessandro Moschitti | Olga Uryupina | Barbara Plank | Katja Filippova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Experiments with crowdsourced re-annotation of a POS tagging data set
Dirk Hovy | Barbara Plank | Anders Søgaard
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Linguistically debatable or just plain wrong?
Barbara Plank | Dirk Hovy | Anders Søgaard
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Learning part-of-speech taggers with inter-annotator agreement loss
Barbara Plank | Dirk Hovy | Anders Søgaard
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
What’s in a p-value in NLP?
Anders Søgaard | Anders Johannsen | Barbara Plank | Dirk Hovy | Hector Martínez Alonso
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Robust Cross-Domain Sentiment Analysis for Low-Resource Languages
Jakob Elming | Barbara Plank | Dirk Hovy
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Adapting taggers to Twitter with not-so-distant supervision
Barbara Plank | Dirk Hovy | Ryan McDonald | Anders Søgaard
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Selection Bias, Label Bias, and Bias in Ground Truth
Anders Søgaard | Barbara Plank | Dirk Hovy
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts

pdf bib
Importance weighting and unsupervised domain adaptation of POS taggers: a negative result
Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
Olga Uryupina | Barbara Plank | Aliaksei Severyn | Agata Rotondi | Alessandro Moschitti
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details.

pdf bib
When POS data sets don’t add up: Combatting sample bias
Dirk Hovy | Barbara Plank | Anders Søgaard
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Several works in Natural Language Processing have recently looked into part-of-speech annotation of Twitter data and typically used their own data sets. Since conventions on Twitter change rapidly, models often show sample bias. Training on a combination of the existing data sets should help overcome this bias and produce more robust models than any trained on the individual corpora. Unfortunately, combining the existing corpora proves difficult: many of the corpora use proprietary tag sets that have little or no overlap. Even when mapped to a common tag set, the different corpora systematically differ in their treatment of various tags and tokens. This includes both pre-processing decisions, as well as default labels for frequent tokens, thus exhibiting data bias and label bias, respectively. Only if we address these biases can we combine the existing data sets to also overcome sample bias. We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21%.

2013

pdf bib
Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
Barbara Plank | Alessandro Moschitti
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Effective Measures of Domain Similarity for Parsing
Barbara Plank | Gertjan van Noord
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reversible Stochastic Attribute-Value Grammars
Daniël de Kok | Barbara Plank | Gertjan van Noord
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Grammar-Driven versus Data-Driven: Which Parsing System Is More Affected by Domain Shifts?
Barbara Plank | Gertjan van Noord
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

pdf bib
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Hal Daumé III | Tejaswini Deoskar | David McClosky | Barbara Plank | Jörg Tiedemann
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

pdf bib
Improved Statistical Measures to Assess Natural Language Parser Performance across Domains
Barbara Plank
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We examine the performance of three dependency parsing systems, in particular, their performance variation across Wikipedia domains. We assess the performance variation of (i) Alpino, a deep grammar-based system coupled with a statistical disambiguation versus (ii) MST and Malt, two purely data-driven statistical dependency parsing systems. The question is how the performance of each parser correlates with simple statistical measures of the text (e.g. sentence length, unknown word rate, etc.). This would give us an idea of how sensitive the different systems are to domain shifts, i.e. which system is more in need for domain adaptation techniques. To this end, we extend the statistical measures used by Zhang and Wang (2009) for English and evaluate the systems on several Wikipedia domains by focusing on a freer word-order language, Dutch. The results confirm the general findings of Zhang and Wang (2009), i.e. different parsing systems have different sensitivity against various statistical measure of the text, where the highest correlation to parsing accuracy was found for the measure we added, sentence perplexity.

2009

pdf bib
A Comparison of Structural Correspondence Learning and Self-training for Discriminative Parse Selection
Barbara Plank
Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing

pdf bib
Structural Correspondence Learning for Parse Disambiguation
Barbara Plank
Proceedings of the Student Research Workshop at EACL 2009

2008

pdf bib
Subdomain Sensitive Statistical Parsing using Raw Corpora
Barbara Plank | Khalil Sima’an
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Modern statistical parsers are trained on large annotated corpora (treebanks). These treebanks usually consist of sentences addressing different subdomains (e.g. sports, politics, music), which implies that the statistics gathered by current statistical parsers are mixtures of subdomains of language use. In this paper we present a method that exploits raw subdomain corpora gathered from the web to introduce subdomain sensitivity into a given parser. We employ statistical techniques for creating an ensemble of domain sensitive parsers, and explore methods for amalgamating their predictions. Our experiments show that introducing domain sensitivity by exploiting raw corpora can improve over a tough, state-of-the-art baseline.

pdf bib
Exploring an Auxiliary Distribution Based Approach to Domain Adaptation of a Syntactic Disambiguation Model
Barbara Plank | Gertjan van Noord
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

2006

pdf bib
Multilingual Search in Libraries. The case-study of the Free University of Bozen-Bolzano
R. Bernardi | D. Calvanese | L. Dini | V. Di Tomaso | E. Frasnelli | U. Kugler | B. Plank
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents an on-going project aiming at enhancing the OPAC (Online Public Access Catalog) search system of the Library of the Free University of Bozen-Bolzano with multilingual access. The Multilingual search system (MUSIL), we have developed, integrates advanced linguistic technologies in a user friendly interface and bridges the gap between the world of free text search and the world of conceptual librarian search. In this paper we present the architecture of the system, its interface and preliminary evaluations of the precision of the search results.
Search