Mark Anderson


2020

pdf bib
On the Frailty of Universal POS Tags for Neural UD Parsers
Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 24th Conference on Computational Natural Language Learning

We present an analysis on the effect UPOS accuracy has on parsing performance. Results suggest that leveraging UPOS tags as fea-tures for neural parsers requires a prohibitively high tagging accuracy and that the use of gold tags offers a non-linear increase in performance, suggesting some sort of exceptionality. We also investigate what aspects of predicted UPOS tags impact parsing accuracy the most, highlighting some potentially meaningful linguistic facets of the problem.

pdf bib
Inherent Dependency Displacement Bias of Transition-Based Algorithms
Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 12th Language Resources and Evaluation Conference

A wide variety of transition-based algorithms are currently used for dependency parsers. Empirical studies have shown that performance varies across different treebanks in such a way that one algorithm outperforms another on one treebank and the reverse is true for a different treebank. There is often no discernible reason for what causes one algorithm to be more suitable for a certain treebank and less so for another. In this paper we shed some light on this by introducing the concept of an algorithm’s inherent dependency displacement distribution. This characterises the bias of the algorithm in terms of dependency displacement, which quantify both distance and direction of syntactic relations. We show that the similarity of an algorithm’s inherent distribution to a treebank’s displacement distribution is clearly correlated to the algorithm’s parsing performance on that treebank, specificially with highly significant and substantial correlations for the predominant sentence lengths in Universal Dependency treebanks. We also obtain results which show a more discrete analysis of dependency displacement does not result in any meaningful correlations.

pdf bib
Distilling Neural Networks for Greener and Faster Dependency Parsing
Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

The carbon footprint of natural language processing research has been increasing in recent years due to its reliance on large and inefficient neural network implementations. Distillation is a network compression technique which attempts to impart knowledge from a large model to a smaller one. We use teacher-student distillation to improve the efficiency of the Biaffine dependency parser which obtains state-of-the-art performance with respect to accuracy and parsing speed (Dozat and Manning, 2017). When distilling to 20% of the original model’s trainable parameters, we only observe an average decrease of ∼1 point for both UAS and LAS across a number of diverse Universal Dependency treebanks while being 2.30x (1.19x) faster than the baseline model on CPU (GPU) at inference time. We also observe a small increase in performance when compressing to 80% for some treebanks. Finally, through distillation we attain a parser which is not only faster but also more accurate than the fastest modern parser on the Penn Treebank.

pdf bib
Efficient EUD Parsing
Mathieu Dehouck | Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We present the system submission from the FASTPARSE team for the EUD Shared Task at IWPT 2020. We engaged with the task by focusing on efficiency. For this we considered training costs and inference efficiency. Our models are a combination of distilled neural dependency parsers and a rule-based system that projects UD trees into EUD graphs. We obtained an average ELAS of 74.04 for our official submission, ranking 4th overall.

2019

pdf bib
Artificially Evolved Chunks for Morphosyntactic Analysis
Mark Anderson | David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)