Daniel Dakota


pdf bib
Investigating Multilingual Abusive Language Detection: A Cautionary Tale
Kenneth Steimel | Daniel Dakota | Yue Chen | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Abusive language detection has received much attention in the last years, and recent approaches perform the task in a number of different languages. We investigate which factors have an effect on multilingual settings, focusing on the compatibility of data and annotations. In the current paper, we focus on English and German. Our findings show large differences in performance between the two languages. We find that the best performance is achieved by different classification algorithms. Sampling to address class imbalance issues is detrimental for German and beneficial for English. The only similarity that we find is that neither data set shows clear topics when we compare the results of topic modeling to the gold standard. Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.


pdf bib
Practical Parsing for Downstream Applications
Daniel Dakota | Sandra Kübler
Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts


pdf bib
Towards Replicability in Parsing
Daniel Dakota | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We investigate parsing replicability across 7 languages (and 8 treebanks), showing that choices concerning the use of grammatical functions in parsing or evaluation, the influence of the rare word threshold, as well as choices in test sentences and evaluation script options have considerable and often unexpected effects on parsing accuracies. All of those choices need to be carefully documented if we want to ensure replicability.

pdf bib
Non-Deterministic Segmentation for Chinese Lattice Parsing
Hai Hu | Daniel Dakota | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words.


pdf bib
IUCL at SemEval-2016 Task 6: An Ensemble Model for Stance Detection in Twitter
Can Liu | Wen Li | Bradford Demarest | Yue Chen | Sara Couture | Daniel Dakota | Nikita Haduong | Noah Kaufman | Andrew Lamont | Manan Pancholi | Kenneth Steimel | Sandra Kübler
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


pdf bib
“My Curiosity was Satisfied, but not in a Good Way”: Predicting User Ratings for Online Recipes
Can Liu | Chun Guo | Daniel Dakota | Sridhar Rajagopalan | Wen Li | Sandra Kübler | Ning Yu
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
Parsing German: How Much Morphology Do We Need?
Wolfgang Maier | Sandra Kübler | Daniel Dakota | Daniel Whyatt
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages