Joachim Wagner


2020

pdf bib
Treebank Embedding Vectors for Out-of-Domain Dependency Parsing
Joachim Wagner | James Barry | Jennifer Foster
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A recent advance in monolingual dependency parsing is the idea of a treebank embedding vector, which allows all treebanks for a particular language to be used as training data while at the same time allowing the model to prefer training data from one treebank over others and to select the preferred treebank at test time. We build on this idea by 1) introducing a method to predict a treebank vector for sentences that do not come from a treebank used in training, and 2) exploring what happens when we move away from predefined treebank embedding vectors during test time and instead devise tailored interpolations. We show that 1) there are interpolated vectors that are superior to the predefined ones, and 2) treebank vectors can be predicted with sufficient accuracy, for nine out of ten test languages, to match the performance of an oracle approach that knows the most suitable predefined treebank embedding for the test set.

pdf bib
The ADAPT Enhanced Dependency Parser at the IWPT 2020 Shared Task
James Barry | Joachim Wagner | Jennifer Foster
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We describe the ADAPT system for the 2020 IWPT Shared Task on parsing enhanced Universal Dependencies in 17 languages. We implement a pipeline approach using UDPipe and UDPipe-future to provide initial levels of annotation. The enhanced dependency graph is either produced by a graph-based semantic dependency parser or is built from the basic tree using a small set of heuristics. Our results show that, for the majority of languages, a semantic dependency parser can be successfully applied to the task of parsing enhanced dependencies. Unfortunately, we did not ensure a connected graph as part of our pipeline approach and our competition submission relied on a last-minute fix to pass the validation script which harmed our official evaluation scores significantly. Our submission ranked eighth in the official evaluation with a macro-averaged coarse ELAS F1 of 67.23 and a treebank average of 67.49. We later implemented our own graph-connecting fix which resulted in a score of 79.53 (language average) or 79.76 (treebank average), which would have placed fourth in the competition evaluation.

2019

pdf bib
Cross-lingual Parsing with Polyglot Training and Multi-treebank Learning: A Faroese Case Study
James Barry | Joachim Wagner | Jennifer Foster
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Cross-lingual dependency parsing involves transferring syntactic knowledge from one language to another. It is a crucial component for inducing dependency parsers in low-resource scenarios where no training data for a language exists. Using Faroese as the target language, we compare two approaches using annotation projection: first, projecting from multiple monolingual source models; second, projecting from a single polyglot model which is trained on the combination of all source languages. Furthermore, we reproduce multi-source projection (Tyers et al., 2018), in which dependency trees of multiple sources are combined. Finally, we apply multi-treebank modelling to the projected treebanks, in addition to or alternatively to polyglot modelling on the source side. We find that polyglot training on the source languages produces an overall trend of better results on the target language but the single best result for the target language is obtained by projecting from monolingual source parsing models and then training multi-treebank POS tagging and parsing models on the target side.

pdf bib
APE through Neural and Statistical MT with Augmented Data. ADAPT/DCU Submission to the WMT 2019 APE Shared Task
Dimitar Shterionov | Joachim Wagner | Félix do Carmo
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Automatic post-editing (APE) can be reduced to a machine translation (MT) task, where the source is the output of a specific MT system and the target is its post-edited variant. However, this approach does not consider context information that can be found in the original source of the MT system. Thus a better approach is to employ multi-source MT, where two input sequences are considered – the one being the original source and the other being the MT output. Extra context information can be introduced in the form of extra tokens that identify certain global property of a group of segments, added as a prefix or a suffix to each segment. Successfully applied in domain adaptation of MT as well as on APE, this technique deserves further attention. In this work we investigate multi-source neural APE (or NPE) systems with training data which has been augmented with two types of extra context tokens. We experiment with authentic and synthetic data provided by WMT 2019 and submit our results to the APE shared task. We also experiment with using statistical machine translation (SMT) methods for APE. While our systems score bellow the baseline, we consider this work a step towards understanding the added value of extra context in the case of APE.

2016

pdf bib
Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling
Utsab Barman | Joachim Wagner | Jennifer Foster
Proceedings of the Second Workshop on Computational Approaches to Code Switching

2015

pdf bib
DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the Generalised Perceptron
Joachim Wagner | Jennifer Foster
Proceedings of the Workshop on Noisy User-generated Text

2014

pdf bib
DCU: Aspect-based Polarity Classification for SemEval Task 4
Joachim Wagner | Piyush Arora | Santiago Cortes | Utsab Barman | Dasha Bogdanova | Jennifer Foster | Lamia Tounsi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Target-Centric Features for Translation Quality Estimation
Chris Hokamp | Iacer Calixto | Joachim Wagner | Jian Zhang
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Code Mixing: A Challenge for Language Identification in the Language of Social Media
Utsab Barman | Amitava Das | Joachim Wagner | Jennifer Foster
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
DCU-UVT: Word-Level Language Classification with Code-Mixed Data
Utsab Barman | Joachim Wagner | Grzegorz Chrupała | Jennifer Foster
Proceedings of the First Workshop on Computational Approaches to Code Switching

2013

pdf bib
DCU-Symantec at the WMT 2013 Quality Estimation Shared Task
Raphael Rubino | Joachim Wagner | Jennifer Foster | Johann Roturier | Rasoul Samad Zadeh Kaljahi | Fred Hollowood
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
DCU-Symantec Submission for the WMT 2012 Quality Estimation Task
Raphael Rubino | Jennifer Foster | Joachim Wagner | Johann Roturier | Rasul Samad Zadeh Kaljahi | Fred Hollowood
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Comparing the Use of Edited and Unedited Text in Parser Self-Training
Jennifer Foster | Özlem Çetinoğlu | Joachim Wagner | Josef van Genabith
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0
Jennifer Foster | Özlem Çetinoğlu | Joachim Wagner | Joseph Le Roux | Joakim Nivre | Deirdre Hogan | Josef van Genabith
Proceedings of 5th International Joint Conference on Natural Language Processing

2009

pdf bib
The effect of correcting grammatical errors on parse probabilities
Joachim Wagner | Jennifer Foster
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

pdf bib
Adapting a WSJ-Trained Parser to Grammatically Noisy Text
Jennifer Foster | Joachim Wagner | Josef van Genabith
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Parser-Based Retraining for Domain Adaptation of Probabilistic Generators
Deirdre Hogan | Jennifer Foster | Joachim Wagner | Josef van Genabith
Proceedings of the Fifth International Natural Language Generation Conference

2007

pdf bib
Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain Self-Training
Jennifer Foster | Joachim Wagner | Djamé Seddah | Josef van Genabith
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors
Joachim Wagner | Jennifer Foster | Josef van Genabith
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)