Andy Way


2020

pdf bib
Multiple Segmentations of Thai Sentences for Neural Machine Translation
Alberto Poncelas | Wichaya Pidchamook | Chao-Hong Liu | James Hadley | Andy Way
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Thai is a low-resource language, so it is often the case that data is not available in sufficient quantities to train an Neural Machine Translation (NMT) model which perform to a high level of quality. In addition, the Thai script does not use white spaces to delimit the boundaries between words, which adds more complexity when building sequence to sequence models. In this work, we explore how to augment a set of English–Thai parallel data by replicating sentence-pairs with different word segmentation methods on Thai, as training data for NMT model training. Using different merge operations of Byte Pair Encoding, different segmentations of Thai sentences can be obtained. The experiments show that combining these datasets, performance is improved for NMT models trained with a dataset that has been split using a supervised splitting tool.

pdf bib
Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation
Xabier Soto | Dimitar Shterionov | Alberto Poncelas | Andy Way
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Machine translation (MT) has benefited from using synthetic training data originating from translating monolingual corpora, a technique known as backtranslation. Combining backtranslated data from different sources has led to better results than when using such data in isolation. In this work we analyse the impact that data translated with rule-based, phrase-based statistical and neural MT systems has on new MT systems. We use a real-world low-resource use-case (Basque-to-Spanish in the clinical domain) as well as a high-resource language pair (German-to-English) to test different scenarios with backtranslation and employ data selection to optimise the synthetic corpora. We exploit different data selection strategies in order to reduce the amount of data used, while at the same time maintaining high-quality MT systems. We further tune the data selection method by taking into account the quality of the MT systems used for backtranslation and lexical diversity of the resulting corpora. Our experiments show that incorporating backtranslated data from different sources can be beneficial, and that availing of data selection can yield improved performance.

pdf bib
Effectively Aligning and Filtering Parallel Corpora under Sparse Data Conditions
Steinþór Steingrímsson | Hrafn Loftsson | Andy Way
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Parallel corpora are key to developing good machine translation systems. However, abundant parallel data are hard to come by, especially for languages with a low number of speakers. When rich morphology exacerbates the data sparsity problem, it is imperative to have accurate alignment and filtering methods that can help make the most of what is available by maximising the number of correctly translated segments in a corpus and minimising noise by removing incorrect translations and segments containing extraneous data. This paper sets out a research plan for improving alignment and filtering methods for parallel texts in low-resource settings. We propose an effective unsupervised alignment method to tackle the alignment problem. Moreover, we propose a strategy to supplement state-of-the-art models with automatically extracted information using basic NLP tools to effectively handle rich morphology.

pdf bib
The ADAPT System Description for the STAPLE 2020 English-to-Portuguese Translation Task
Rejwanul Haque | Yasmin Moslem | Andy Way
Proceedings of the Fourth Workshop on Neural Generation and Translation

This paper describes the ADAPT Centre’s submission to STAPLE (Simultaneous Translation and Paraphrase for Language Education) 2020, a shared task of the 4th Workshop on Neural Generation and Translation (WNGT), for the English-to-Portuguese translation task. In this shared task, the participants were asked to produce high-coverage sets of plausible translations given English prompts (input source sentences). We present our English-to-Portuguese machine translation (MT) models that were built applying various strategies, e.g. data and sentence selection, monolingual MT for generating alternative translations, and combining multiple n-best translations. Our experiments show that adding the aforementioned techniques to the baseline yields an excellent performance in the English-to-Portuguese translation task.

pdf bib
The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Georg Rehm | Katrin Marheinecke | Stefanie Hegele | Stelios Piperidis | Kalina Bontcheva | Jan Hajič | Khalid Choukri | Andrejs Vasiļjevs | Gerhard Backfried | Christoph Prinz | José Manuel Gómez-Pérez | Luc Meertens | Paul Lukowicz | Josef van Genabith | Andrea Lösch | Philipp Slusallek | Morten Irgens | Patrick Gatellier | Joachim Köhler | Laure Le Bars | Dimitra Anastasiou | Albina Auksoriūtė | Núria Bel | António Branco | Gerhard Budin | Walter Daelemans | Koenraad De Smedt | Radovan Garabík | Maria Gavriilidou | Dagmar Gromann | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Jan Odijk | Maciej Ogrodniczuk | Eiríkur Rögnvaldsson | Mike Rosner | Bolette Pedersen | Inguna Skadiņa | Marko Tadić | Dan Tufiș | Tamás Váradi | Kadri Vider | Andy Way | François Yvon
Proceedings of the 12th Language Resources and Evaluation Conference

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

pdf bib
On Context Span Needed for Machine Translation Evaluation
Sheila Castilho | Maja Popović | Andy Way
Proceedings of the 12th Language Resources and Evaluation Conference

Despite increasing efforts to improve evaluation of machine translation (MT) by going beyond the sentence level to the document level, the definition of what exactly constitutes a “document level” is still not clear. This work deals with the context span necessary for a more reliable MT evaluation. We report results from a series of surveys involving three domains and 18 target languages designed to identify the necessary context span as well as issues related to it. Our findings indicate that, despite the fact that some issues and spans are strongly dependent on domain and on the target language, a number of common patterns can be observed so that general guidelines for context-aware MT evaluation can be drawn.

pdf bib
Constraining the Transformer NMT Model with Heuristic Grid Beam Search
Guodong Xie | Andy Way
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
The Impact of Indirect Machine Translation on Sentiment Classification
Alberto Poncelas | Pintu Lohar | James Hadley | Andy Way
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

bib
A Case Study of Natural Gender Phenomena in Translation: A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish
Argentina Anna Rescigno | Johanna Monti | Andy Way | Eva Vanmassenhove
Workshop on the Impact of Machine Translation (iMpacT 2020)

pdf bib
Neural Machine Translation for translating into Croatian and Serbian
Maja Popović | Alberto Poncelas | Marija Brkic | Andy Way
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects

In this work, we systematically investigate different set-ups for training of neural machine translation (NMT) systems for translation into Croatian and Serbian, two closely related South Slavic languages. We explore English and German as source languages, different sizes and types of training corpora, as well as bilingual and multilingual systems. We also explore translation of English IMDb user movie reviews, a domain/genre where only monolingual data are available. First, our results confirm that multilingual systems with joint target languages perform better. Furthermore, translation performance from English is much better than from German, partly because German is morphologically more complex and partly because the corpus consists mostly of parallel human translations instead of original text and its human translation. The translation from German should be further investigated systematically. For translating user reviews, creating synthetic in-domain parallel data through back- and forward-translation and adding them to a small out-of-domain parallel corpus can yield performance comparable with a system trained on a full out-of-domain corpus. However, it is still not clear what is the optimal size of synthetic in-domain data, especially for forward-translated data where the target language is machine translated. More detailed research including manual evaluation and analysis is needed in this direction.

pdf bib
ELRI: A Decentralised Network of National Relay Stations to Collect, Prepare and Share Language Resources
Thierry Etchegoyhen | Borja Anza Porras | Andoni Azpeitia | Eva Martínez Garcia | José Luis Fonseca | Patricia Fonseca | Paulo Vale | Jane Dunne | Federico Gaspari | Teresa Lynn | Helen McHugh | Andy Way | Victoria Arranz | Khalid Choukri | Hervé Pusset | Alexandre Sicard | Rui Neto | Maite Melero | David Perez | António Branco | Ruben Branco | Luís Gomes
Proceedings of the 1st International Workshop on Language Technology Platforms

We describe the European Language Resource Infrastructure (ELRI), a decentralised network to help collect, prepare and share language resources. The infrastructure was developed within a project co-funded by the Connecting Europe Facility Programme of the European Union, and has been deployed in the four Member States participating in the project, namely France, Ireland, Portugal and Spain. ELRI provides sustainable and flexible means to collect and share language resources via National Relay Stations, to which members of public institutions can freely subscribe. The infrastructure includes fully automated data processing engines to facilitate the preparation, sharing and wider reuse of useful language resources that can help optimise human and automated translation services in the European Union.

pdf bib
Modelling Source- and Target- Language Syntactic Information as Conditional Context in Interactive Neural Machine Translation
Kamal Kumar Gupta | Rejwanul Haque | Asif Ekbal | Pushpak Bhattacharyya | Andy Way
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In interactive machine translation (MT), human translators correct errors in automatic translations in collaboration with the MT systems, which is seen as an effective way to improve the productivity gain in translation. In this study, we model source-language syntactic constituency parse and target-language syntactic descriptions in the form of supertags as conditional context for interactive prediction in neural MT (NMT). We found that the supertags significantly improve productivity gain in translation in interactive-predictive NMT (INMT), while syntactic parsing somewhat found to be effective in reducing human effort in translation. Furthermore, when we model this source- and target-language syntactic information together as the conditional context, both types complement each other and our fully syntax-informed INMT model statistically significantly reduces human efforts in a French–to–English translation task, achieving 4.30 points absolute (corresponding to 9.18% relative) improvement in terms of word prediction accuracy (WPA) and 4.84 points absolute (corresponding to 9.01% relative) reduction in terms of word stroke ratio (WSR) over the baseline.

pdf bib
MT syntactic priming effects on L2 English speakers
Natália Resende | Benjamin Cowan | Andy Way
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In this paper, we tested 20 Brazilian Portuguese speakers at intermediate and advanced English proficiency levels to investigate the influence of Google Translate’s MT system on the mental processing of English as a second language. To this end, we employed a syntactic priming experimental paradigm using a pretest-priming design which allowed us to compare participants’ linguistic behaviour before and after a translation task using Google Translate. Results show that, after performing a translation task with Google Translate, participants more frequently described images in English using the syntactic alternative previously seen in the output of Google Translate, compared to the translation task with no prior influence of the MT output. Results also show that this syntactic priming effect is modulated by English proficiency levels.

pdf bib
A human evaluation of English-Irish statistical and neural machine translation
Meghan Dowling | Sheila Castilho | Joss Moorkens | Teresa Lynn | Andy Way
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

With official status in both Ireland and the EU, there is a need for high-quality English-Irish (EN-GA) machine translation (MT) systems which are suitable for use in a professional translation environment. While we have seen recent research on improving both statistical MT and neural MT for the EN-GA pair, the results of such systems have always been reported using automatic evaluation metrics. This paper provides the first human evaluation study of EN-GA MT using professional translators and in-domain (public administration) data for a more accurate depiction of the translation quality available via MT.

pdf bib
Progress of the PRINCIPLE Project: Promoting MT for Croatian, Icelandic, Irish and Norwegian
Andy Way | Petra Bago | Jane Dunne | Federico Gaspari | Andre Kåsen | Gauti Kristmannsson | Helen McHugh | Jon Arild Olsen | Dana Davis Sheridan | Páraic Sheridan | John Tinsley
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper updates the progress made on the PRINCIPLE project, a 2-year action funded by the European Commission under the Connecting Europe Facility (CEF) programme. PRINCIPLE focuses on collecting high-quality language resources for Croatian, Icelandic, Irish and Norwegian, which have been identified as low-resource languages, especially for building effective machine translation (MT) systems. We report initial achievements of the project and ongoing activities aimed at promoting the uptake of neural MT for the low-resource languages of the project.

pdf bib
MTrill project: Machine Translation impact on language learning
Natália Resende | Andy Way
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Over the last decades, massive research investments have been made in the development of machine translation (MT) systems (Gupta and Dhawan, 2019). This has brought about a paradigm shift in the performance of these language tools, leading to widespread use of popular MT systems (Gaspari and Hutchins, 2007). Although the first MT engines were used for gisting purposes, in recent years, there has been an increasing interest in using MT tools, especially the freely available online MT tools, for language teaching and learning (Clifford et al., 2013). The literature on MT and Computer Assisted Language Learning (CALL) shows that, over the years, MT systems have been facilitating language teaching and also language learning (Nin ̃o, 2006). It has been shown that MT tools can increase awareness of grammatical linguistic features of a foreign language. Research also shows the positive role of MT systems in the development of writing skills in English as well as in improving communication skills in English(Garcia and Pena, 2011). However, to date, the cognitive impact of MT on language acquisition and on the syntactic aspects of language processing has not yet been investigated and deserves further scrutiny. The MTril project aims at filling this gap in the literature by examining whether MT is contributing to a central aspect of language acquisition: the so-called language binding, i.e., the ability to combine single words properly in a grammatical sentence (Heyselaar et al., 2017; Ferreira and Bock, 2006). The project focus on the initial stages (pre-intermediate and intermediate) of the acquisition of English syntax by Brazilian Portuguese native speakers using MT systems as a support for language learning.

pdf bib
A Tool for Facilitating OCR Postediting in Historical Documents
Alberto Poncelas | Mohammad Aboomar | Jan Buts | James Hadley | Andy Way
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

Optical character recognition (OCR) for historical documents is a complex procedure subject to a unique set of material issues, including inconsistencies in typefaces and low quality scanning. Consequently, even the most sophisticated OCR engines produce errors. This paper reports on a tool built for postediting the output of Tesseract, more specifically for correcting common errors in digitized historical documents. The proposed tool suggests alternatives for word forms not found in a specified vocabulary. The assumed error is replaced by a presumably correct alternative in the post-edition based on the scores of a Language Model (LM). The tool is tested on a chapter of the book An Essay Towards Regulating the Trade and Employing the Poor of this Kingdom (Cary, 1719). As demonstrated below, the tool is successful in correcting a number of common errors. If sometimes unreliable, it is also transparent and subject to human intervention.

2019

pdf bib
Building English-to-Serbian Machine Translation System for IMDb Movie Reviews
Pintu Lohar | Maja Popović | Andy Way
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.

pdf bib
Proceedings of Machine Translation Summit XVII Volume 1: Research Track
Mikel Forcada | Andy Way | Barry Haddow | Rico Sennrich
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation
Eva Vanmassenhove | Dimitar Shterionov | Andy Way
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks
Mikel Forcada | Andy Way | John Tinsley | Dimitar Shterionov | Celia Rico | Federico Gaspari
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
PRINCIPLE: Providing Resources in Irish, Norwegian, Croatian and Icelandic for the Purposes of Language Engineering
Andy Way | Federico Gaspari
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
Pivot Machine Translation in INTERACT Project
Chao-Hong Liu | Andy Way | Catarina Silva | André Martins
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
Large-scale Machine Translation Evaluation of the iADAATPA Project
Sheila Castilho | Natália Resende | Federico Gaspari | Andy Way | Tony O’Dowd | Marek Mazur | Manuel Herranz | Alex Helle | Gema Ramírez-Sánchez | Víctor Sánchez-Cartagena | Mārcis Pinnis | Valters Šics
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
When less is more in Neural Quality Estimation of Machine Translation. An industry case study
Dimitar Shterionov | Félix Do Carmo | Joss Moorkens | Eric Paquin | Dag Schmidtke | Declan Groves | Andy Way
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
Leveraging backtranslation to improve machine translation for Gaelic languages
Meghan Dowling | Teresa Lynn | Andy Way
Proceedings of the Celtic Language Technology Workshop

pdf bib
Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation
Alberto Poncelas | Gideon Maillette de Buy Wenniger | Andy Way
Proceedings of The 8th Workshop on Patent and Scientific Literature Translation

pdf bib
Proceedings of the Qualities of Literary Machine Translation
James Hadley | Maja Popović | Haithem Afli | Andy Way
Proceedings of the Qualities of Literary Machine Translation

pdf bib
Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation
Alberto Poncelas | Andy Way
Proceedings of the 12th International Conference on Natural Language Generation

Neural Machine Translation (NMT) models tend to achieve the best performances when larger sets of parallel sentences are provided for training. For this reason, augmenting the training set with artificially-generated sentence pair can boost the performance. Nonetheless, the performance can also be improved with a small number of sentences if they are in the same domain as the test set. Accordingly, we want to explore the use of artificially-generated sentence along with data-selection algorithms to improve NMT models trained solely with authentic data. In this work, we show how artificially-generated sentences can be more beneficial than authentic pairs and what are their advantages when used in combination with data-selection algorithms.

pdf bib
Investigating Terminology Translation in Statistical and Neural Machine Translation: A Case Study on English-to-Hindi and Hindi-to-English
Rejwanul Haque | Md Hasanuzzaman | Andy Way
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Terminology translation plays a critical role in domain-specific machine translation (MT). In this paper, we conduct a comparative qualitative evaluation on terminology translation in phrase-based statistical MT (PB-SMT) and neural MT (NMT) in two translation directions: English-to-Hindi and Hindi-to-English. For this, we select a test set from a legal domain corpus and create a gold standard for evaluating terminology translation in MT. We also propose an error typology taking the terminology translation errors into consideration. We evaluate the MT systems’ performance on terminology translation, and demonstrate our findings, unraveling strengths, weaknesses, and similarities of PB-SMT and NMT in the area of term translation.

pdf bib
Combining PBSMT and NMT Back-translated Data for Efficient NMT
Alberto Poncelas | Maja Popović | Dimitar Shterionov | Gideon Maillette de Buy Wenniger | Andy Way
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation, which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

2018

pdf bib
FooTweets: A Bilingual Parallel Corpus of World Cup Tweets
Henny Sluyter-Gäthje | Pintu Lohar | Haithem Afli | Andy Way
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation
Peyman Passban | Qun Liu | Andy Way
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Recently, neural machine translation (NMT) has emerged as a powerful alternative to conventional statistical approaches. However, its performance drops considerably in the presence of morphologically rich languages (MRLs). Neural engines usually fail to tackle the large vocabulary and high out-of-vocabulary (OOV) word rate of MRLs. Therefore, it is not suitable to exploit existing word-based models to translate this set of languages. In this paper, we propose an extension to the state-of-the-art model of Chung et al. (2016), which works at the character level and boosts the decoder with target-side morphological information. In our architecture, an additional morphology table is plugged into the model. Each time the decoder samples from a target vocabulary, the table sends auxiliary signals from the most relevant affixes in order to enrich the decoder’s current state and constrain it to provide better predictions. We evaluated our model to translate English into German, Russian, and Turkish as three MRLs and observed significant improvements.

pdf bib
Fine-Grained Temporal Orientation and its Relationship with Psycho-Demographic Correlates
Sabyasachi Kamila | Mohammed Hasanuzzaman | Asif Ekbal | Pushpak Bhattacharyya | Andy Way
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Temporal orientation refers to an individual’s tendency to connect to the psychological concepts of past, present or future, and it affects personality, motivation, emotion, decision making and stress coping processes. The study of the social media users’ psycho-demographic attributes from the perspective of human temporal orientation can be of utmost interest and importance to the business and administrative decision makers as it can provide an extra precious information for them to make informed decisions. In this paper, we propose a very first study to demonstrate the association between the sentiment view of the temporal orientation of the users and their different psycho-demographic attributes by analyzing their tweets. We first create a temporal orientation classifier in a minimally supervised way which classifies each tweet of the users in one of the three temporal categories, namely past, present, and future. A deep Bi-directional Long Short Term Memory (BLSTM) is used for the tweet classification task. Our tweet classifier achieves an accuracy of 78.27% when tested on a manually created test set. We then determine the users’ overall temporal orientation based on their tweets on the social media. The sentiment is added to the tweets at the fine-grained level where each temporal tweet is given a sentiment with either of the positive, negative or neutral. Our experiment reveals that depending upon the sentiment view of temporal orientation, a user’s attributes vary. We finally measure the correlation between the users’ sentiment view of temporal orientation and their different psycho-demographic factors using regression.

pdf bib
Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction
Jinhua Du | Jingguang Han | Andy Way | Dadong Wan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Attention mechanism is often used in deep neural networks for distantly supervised relation extraction (DS-RE) to distinguish valid from noisy instances. However, traditional 1-D vector attention model is insufficient for learning of different contexts in the selection of valid instances to predict the relationship for an entity pair. To alleviate this issue, we propose a novel multi-level structured (2-D matrix) self-attention mechanism for DS-RE in a multi-instance learning (MIL) framework using bidirectional recurrent neural networks (BiRNN). In the proposed method, a structured word-level self-attention learns a 2-D matrix where each row vector represents a weight distribution for different aspects of an instance regarding two entities. Targeting the MIL issue, the structured sentence-level attention learns a 2-D matrix where each row vector represents a weight distribution on selection of different valid instances. Experiments conducted on two publicly available DS-RE datasets show that the proposed framework with multi-level structured self-attention mechanism significantly outperform baselines in terms of PR curves, P@N and F1 measures.

pdf bib
Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism
Longyue Wang | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. Recently, Wang et al. (2018) proposed a novel reconstruction-based approach to alleviating dropped pronoun (DP) translation problems for neural machine translation models. In this work, we improve the original model from two perspectives. First, we employ a shared reconstructor to better exploit encoder and decoder representations. Second, we jointly learn to translate and predict DPs in an end-to-end manner, to avoid the errors propagated from an external DP prediction model. Experimental results show that our approach significantly improves both translation performance and DP prediction accuracy.

pdf bib
Getting Gender Right in Neural Machine Translation
Eva Vanmassenhove | Christian Hardmeier | Andy Way
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Speakers of different languages must attend to and encode strikingly different aspects of the world in order to use their language correctly (Sapir, 1921; Slobin, 1996). One such difference is related to the way gender is expressed in a language. Saying “I am happy” in English, does not encode any additional knowledge of the speaker that uttered the sentence. However, many other languages do have grammatical gender systems and so such knowledge would be encoded. In order to correctly translate such a sentence into, say, French, the inherent gender information needs to be retained/recovered. The same sentence would become either “Je suis heureux”, for a male speaker or “Je suis heureuse” for a female one. Apart from morphological agreement, demographic factors (gender, age, etc.) also influence our use of language in terms of word choices or syntactic constructions (Tannen, 1991; Pennebaker et al., 2003). We integrate gender information into NMT systems. Our contribution is two-fold: (1) the compilation of large datasets with speaker information for 20 language pairs, and (2) a simple set of experiments that incorporate gender information into NMT for multiple language pairs. Our experiments show that adding a gender feature to an NMT system significantly improves the translation quality for some language pairs.

pdf bib
Tailoring Neural Architectures for Translating from Morphologically Rich Languages
Peyman Passban | Andy Way | Qun Liu
Proceedings of the 27th International Conference on Computational Linguistics

A morphologically complex word (MCW) is a hierarchical constituent with meaning-preserving subunits, so word-based models which rely on surface forms might not be powerful enough to translate such structures. When translating from morphologically rich languages (MRLs), a source word could be mapped to several words or even a full sentence on the target side, which means an MCW should not be treated as an atomic unit. In order to provide better translations for MRLs, we boost the existing neural machine translation (NMT) architecture with a double- channel encoder and a double-attentive decoder. The main goal targeted in this research is to provide richer information on the encoder side and redesign the decoder accordingly to benefit from such information. Our experimental results demonstrate that we could achieve our goal as the proposed model outperforms existing subword- and character-based architectures and showed significant improvements on translating from German, Russian, and Turkish into English.

pdf bib
Incorporating Deep Visual Features into Multiobjective based Multi-view Search Results Clustering
Sayantan Mitra | Mohammed Hasanuzzaman | Sriparna Saha | Andy Way
Proceedings of the 27th International Conference on Computational Linguistics

Current paper explores the use of multi-view learning for search result clustering. A web-snippet can be represented using multiple views. Apart from textual view cued by both the semantic and syntactic information, a complimentary view extracted from images contained in the web-snippets is also utilized in the current framework. A single consensus partitioning is finally obtained after consulting these two individual views by the deployment of a multiobjective based clustering technique. Several objective functions including the values of a cluster quality measure measuring the goodness of partitionings obtained using different views and an agreement-disagreement index, quantifying the amount of oneness among multiple views in generating partitionings are optimized simultaneously using AMOSA. In order to detect the number of clusters automatically, concepts of variable length solutions and a vast range of permutation operators are introduced in the clustering process. Finally, a set of alternative partitioning are obtained on the final Pareto front by the proposed multi-view based multiobjective technique. Experimental results by the proposed approach on several benchmark test datasets of SRC with respect to different performance metrics evidently establish the power of visual and text-based views in achieving better search result clustering.

pdf bib
SuperNMT: Neural Machine Translation with Semantic Supersenses and Syntactic Supertags
Eva Vanmassenhove | Andy Way
Proceedings of ACL 2018, Student Research Workshop

In this paper we incorporate semantic supersensetags and syntactic supertag features into EN–FR and EN–DE factored NMT systems. In experiments on various test sets, we observe that such features (and particularly when combined) help the NMT model training to converge faster and improve the model quality according to the BLEU scores.

pdf bib
Balancing Translation Quality and Sentiment Preservation (Non-archival Extended Abstract)
Pintu Lohar | Haithem Afli | Andy Way
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
SMT versus NMT: Preliminary comparisons for Irish
Meghan Dowling | Teresa Lynn | Alberto Poncelas | Andy Way
Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018)

pdf bib
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
Antonio Toral | Sheila Castilho | Ke Hu | Andy Way
Proceedings of the Third Conference on Machine Translation: Research Papers

We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT.

pdf bib
Extracting In-domain Training Corpora for Neural Machine Translation Using Data Selection Methods
Catarina Cruz Silva | Chao-Hong Liu | Alberto Poncelas | Andy Way
Proceedings of the Third Conference on Machine Translation: Research Papers

Data selection is a process used in selecting a subset of parallel data for the training of machine translation (MT) systems, so that 1) resources for training might be reduced, 2) trained models could perform better than those trained with the whole corpus, and/or 3) trained models are more tailored to specific domains. It has been shown that for statistical MT (SMT), the use of data selection helps improve the MT performance significantly. In this study, we reviewed three data selection approaches for MT, namely Term Frequency– Inverse Document Frequency, Cross-Entropy Difference and Feature Decay Algorithm, and conducted experiments on Neural Machine Translation (NMT) with the selected data using the three approaches. The results showed that for NMT systems, using data selection also improved the performance, though the gain is not as much as for SMT systems.

2017

pdf bib
Context-Aware Graph Segmentation for Graph-Based Translation
Liangyou Li | Andy Way | Qun Liu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper, we present an improved graph-based translation model which segments an input graph into node-induced subgraphs by taking source context into consideration. Translations are generated by combining subgraph translations left-to-right using beam search. Experiments on Chinese–English and German–English demonstrate that the context-aware segmentation significantly improves the baseline graph-based model.

pdf bib
Using Images to Improve Machine-Translating E-Commerce Product Listings.
Iacer Calixto | Daniel Stein | Evgeny Matusov | Pintu Lohar | Sheila Castilho | Andy Way
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we study the impact of using images to machine-translate user-generated e-commerce product listings. We study how a multi-modal Neural Machine Translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product listings often do not constitute grammatical or well-formed sentences. More often than not, they consist of the juxtaposition of short phrases or keywords. We train our models end-to-end as well as use text-only and multi-modal NMT models for re-ranking n-best lists generated by an SMT model. We qualitatively evaluate our user-generated training data also analyse how adding synthetic data impacts the results. We evaluate our models quantitatively using BLEU and TER and find that (i) additional synthetic data has a general positive impact on text-only and multi-modal NMT models, and that (ii) using a multi-modal NMT model for re-ranking n-best lists improves TER significantly across different n-best list sizes.

pdf bib
Demographic Word Embeddings for Racism Detection on Twitter
Mohammed Hasanuzzaman | Gaël Dias | Andy Way
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3%) and significantly improves over the classification performance of demographic-agnostic models.

pdf bib
Semantics-Enhanced Task-Oriented Dialogue Translation: A Case Study on Hotel Booking
Longyue Wang | Jinhua Du | Liangyou Li | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the IJCNLP 2017, System Demonstrations

We showcase TODAY, a semantics-enhanced task-oriented dialogue translation system, whose novelties are: (i) task-oriented named entity (NE) definition and a hybrid strategy for NE recognition and translation; and (ii) a novel grounded semantic method for dialogue understanding and task-order management. TODAY is a case-study demo which can efficiently and accurately assist customers and agents in different languages to reach an agreement in a dialogue for the hotel booking.

pdf bib
ADAPT at IJCNLP-2017 Task 4: A Multinomial Naive Bayes Classification Approach for Customer Feedback Analysis task
Pintu Lohar | Koel Dutta Chowdhury | Haithem Afli | Mohammed Hasanuzzaman | Andy Way
Proceedings of the IJCNLP 2017, Shared Tasks

In this age of the digital economy, promoting organisations attempt their best to engage the customers in the feedback provisioning process. With the assistance of customer insights, an organisation can develop a better product and provide a better service to its customer. In this paper, we analyse the real world samples of customer feedback from Microsoft Office customers in four languages, i.e., English, French, Spanish and Japanese and conclude a five-plus-one-classes categorisation (comment, request, bug, complaint, meaningless and undetermined) for meaning classification. The task is to %access multilingual corpora annotated by the proposed meaning categorization scheme and develop a system to determine what class(es) the customer feedback sentences should be annotated as in four languages. We propose following approaches to accomplish this task: (i) a multinomial naive bayes (MNB) approach for multi-label classification, (ii) MNB with one-vs-rest classifier approach, and (iii) the combination of the multilabel classification-based and the sentiment classification-based approach. Our best system produces F-scores of 0.67, 0.83, 0.72 and 0.7 for English, Spanish, French and Japanese, respectively. The results are competitive to the best ones for all languages and secure 3rd and 5th position for Japanese and French, respectively, among all submitted systems.

pdf bib
Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search
Ahmad Khwileh | Haithem Afli | Gareth Jones | Andy Way
Proceedings of the Third Arabic Natural Language Processing Workshop

Cross Language Information Retrieval (CLIR) systems are a valuable tool to enable speakers of one language to search for content of interest expressed in a different language. A group for whom this is of particular interest is bilingual Arabic speakers who wish to search for English language content using information needs expressed in Arabic queries. A key challenge in CLIR is crossing the language barrier between the query and the documents. The most common approach to bridging this gap is automated query translation, which can be unreliable for vague or short queries. In this work, we examine the potential for improving CLIR effectiveness by predicting the translation effectiveness using Query Performance Prediction (QPP) techniques. We propose a novel QPP method to estimate the quality of translation for an Arabic-English Cross-lingual User-generated Speech Search (CLUGS) task. We present an empirical evaluation that demonstrates the quality of our method on alternative translation outputs extracted from an Arabic-to-English Machine Translation system developed for this task. Finally, we show how this framework can be integrated in CLUGS to find relevant translations for improved retrieval performance.

pdf bib
Ethical Considerations in NLP Shared Tasks
Carla Parra Escartín | Wessel Reijers | Teresa Lynn | Joss Moorkens | Andy Way | Chao-Hong Liu
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Shared tasks are increasingly common in our field, and new challenges are suggested at almost every conference and workshop. However, as this has become an established way of pushing research forward, it is important to discuss how we researchers organise and participate in shared tasks, and make that information available to the community to allow further research improvements. In this paper, we present a number of ethical issues along with other areas of concern that are related to the competitive nature of shared tasks. As such issues could potentially impact on research ethics in the Natural Language Processing community, we also propose the development of a framework for the organisation of and participation in shared tasks that can help mitigate against these issues arising.

pdf bib
Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles
Iacer Calixto | Daniel Stein | Evgeny Matusov | Sheila Castilho | Andy Way
Proceedings of the Sixth Workshop on Vision and Language

In this paper, we study how humans perceive the use of images as an additional knowledge source to machine-translate user-generated product listings in an e-commerce company. We conduct a human evaluation where we assess how a multi-modal neural machine translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attention-based NMT and a phrase-based statistical machine translation (PBSMT) model. We evaluate translations obtained with different systems and also discuss the data set of user-generated product listings, which in our case comprises both product listings and associated images. We found that humans preferred translations obtained with a PBSMT system to both text-only and multi-modal NMT over 56% of the time. Nonetheless, human evaluators ranked translations from a multi-modal NMT model as better than those of a text-only NMT over 88% of the time, which suggests that images do help NMT in this use-case.

pdf bib
MultiNews: A Web collection of an Aligned Multimodal and Multilingual Corpus
Haithem Afli | Pintu Lohar | Andy Way
Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora

Integrating Natural Language Processing (NLP) and computer vision is a promising effort. However, the applicability of these methods directly depends on the availability of a specific multimodal data that includes images and texts. In this paper, we present a collection of a Multimodal corpus of comparable texts and their images in 9 languages from the web news articles of Euronews website. This corpus has found widespread use in the NLP community in Multilingual and multimodal tasks. Here, we focus on its acquisition of the images and text data and their multilingual alignment.

pdf bib
Exploiting Cross-Sentence Context for Neural Machine Translation
Longyue Wang | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.

2016

pdf bib
Using BabelNet to Improve OOV Coverage in SMT
Jinhua Du | Andy Way | Andrzej Zydron
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Out-of-vocabulary words (OOVs) are a ubiquitous and difficult problem in statistical machine translation (SMT). This paper studies different strategies of using BabelNet to alleviate the negative impact brought about by OOVs. BabelNet is a multilingual encyclopedic dictionary and a semantic network, which not only includes lexicographic and encyclopedic terms, but connects concepts and named entities in a very large network of semantic relations. By taking advantage of the knowledge in BabelNet, three different methods ― using direct training data, domain-adaptation techniques and the BabelNet API ― are proposed in this paper to obtain translations for OOVs to improve system performance. Experimental results on English―Polish and English―Chinese language pairs show that domain adaptation can better utilize BabelNet knowledge and performs better than other methods. The results also demonstrate that BabelNet is a really useful tool for improving translation performance of SMT systems.

pdf bib
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
Valia Kordoni | Antal van den Bosch | Katia Lida Kermanidis | Vilelmini Sosoni | Kostadin Cholakov | Iris Hendrickx | Matthias Huck | Andy Way
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, a machine translation approach for online educational content. More specifically, videolectures, assignments, and MOOC forum text is automatically translated from English into eleven European and BRIC languages. Unlike previous approaches to machine translation, the output quality in TraMOOC relies on a multimodal evaluation schema that involves crowdsourcing, error type markup, an error taxonomy for translation model comparison, and implicit evaluation via text mining, i.e. entity recognition and its performance comparison between the source and the translated text, and sentiment analysis on the students’ forum posts. Finally, the evaluation output will result in more and better quality in-domain parallel data that will be fed back to the translation engine for higher quality output. The translation service will be incorporated into the Iversity MOOC platform and into the VideoLectures.net digital library portal.

pdf bib
Using SMT for OCR Error Correction of Historical Texts
Haithem Afli | Zhengwei Qiu | Andy Way | Páraic Sheridan
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

A trend to digitize historical paper-based archives has emerged in recent years, with the advent of digital optical scanners. A lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into electronic versions that can be manipulated by a computer. For this purpose, Optical Character Recognition (OCR) systems have been developed to transform scanned digital text into editable computer text. However, different kinds of errors in the OCR system output text can be found, but Automatic Error Correction tools can help in performing the quality of electronic texts by cleaning and removing noises. In this paper, we perform a qualitative and quantitative comparison of several error-correction techniques for historical French documents. Experimentation shows that our Machine Translation for Error Correction method is superior to other Language Modelling correction techniques, with nearly 13% relative improvement compared to the initial baseline.

pdf bib
ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Xiaofeng Wu | Jinhua Du | Qun Liu | Andy Way
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a Controlled Language that is understood by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author’s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules.

pdf bib
Automatic Construction of Discourse Corpora for Dialogue Translation
Longyue Wang | Xiaojun Zhang | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation.

pdf bib
A Novel Approach to Dropped Pronoun Translation
Longyue Wang | Zhaopeng Tu | Xiaojun Zhang | Hang Li | Andy Way | Qun Liu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Identifying Temporal Orientation of Word Senses
Mohammed Hasanuzzaman | Gaël Dias | Stéphane Ferrari | Yann Mathet | Andy Way
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
Fast Gated Neural Domain Adaptation: Language Model as a Case Study
Jian Zhang | Xiaofeng Wu | Andy Way | Qun Liu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Neural network training has been shown to be advantageous in many natural language processing applications, such as language modelling or machine translation. In this paper, we describe in detail a novel domain adaptation mechanism in neural network training. Instead of learning and adapting the neural network on millions of training sentences – which can be very time-consuming or even infeasible in some cases – we design a domain adaptation gating mechanism which can be used in recurrent neural networks and quickly learn the out-of-domain knowledge directly from the word vector representations with little speed overhead. In our experiments, we use the recurrent neural network language model (LM) as a case study. We show that the neural LM perplexity can be reduced by 7.395 and 12.011 using the proposed domain adaptation mechanism on the Penn Treebank and News data, respectively. Furthermore, we show that using the domain-adapted neural LM to re-rank the statistical machine translation n-best list on the French-to-English language pair can significantly improve translation quality.

pdf bib
Topic-Informed Neural Machine Translation
Jian Zhang | Liangyou Li | Andy Way | Qun Liu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In recent years, neural machine translation (NMT) has demonstrated state-of-the-art machine translation (MT) performance. It is a new approach to MT, which tries to learn a set of parameters to maximize the conditional probability of target sentences given source sentences. In this paper, we present a novel approach to improve the translation performance in NMT by conveying topic knowledge during translation. The proposed topic-informed NMT can increase the likelihood of selecting words from the same topic and domain for translation. Experimentally, we demonstrate that topic-informed NMT can achieve a 1.15 (3.3% relative) and 1.67 (5.4% relative) absolute improvement in BLEU score on the Chinese-to-English language pair using NIST 2004 and 2005 test sets, respectively, compared to NMT without topic information.

pdf bib
Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings
Peyman Passban | Qun Liu | Andy Way
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline.

pdf bib
TraMOOC (Translation for Massive Open Online Courses): providing reliable MT for MOOCs
Valia Kordoni | Lexi Birch | Ioana Buliga | Kostadin Cholakov | Markus Egg | Federico Gaspari | Yota Georgakopolou | Maria Gialama | Iris Hendrickx | Mitja Jermol | Katia Kermanidis | Joss Moorkens | Davor Orlic | Michael Papadopoulos | Maja Popović | Rico Sennrich | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Menno van Zaanen | Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Graph-Based Translation Via Graph Segmentation
Liangyou Li | Andy Way | Qun Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Phrase-Level Combination of SMT and TM Using Constrained Word Lattice
Liangyou Li | Andy Way | Qun Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Extending Phrase-Based Translation with Dependencies by Using Graphs
Liangyou Li | Andy Way | Qun Liu
Proceedings of the 2nd Workshop on Semantics-Driven Machine Translation (SedMT 2016)

pdf bib
The ADAPT Bilingual Document Alignment system at WMT16
Pintu Lohar | Haithem Afli | Chao-Hong Liu | Andy Way
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Improving Phrase-Based SMT Using Cross-Granularity Embedding Similarity
Peyman Passban | Chris Hokamp | Andy Way | Qun Liu
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Comparing Translator Acceptability of TM and SMT Outputs
Joss Moorkens | Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Integrating Optical Character Recognition and Machine Translation of Historical Documents
Haithem Afli | Andy Way
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how good the OCR is, this process introduces recognition errors, which often renders MT ineffective. In this paper, we propose a new OCR to MT framework based on adding a new OCR error correction module to enhance the overall quality of translation. Experimentation shows that our new system correction based on the combination of Language Modeling and Translation methods outperforms the baseline system by nearly 30% relative improvement.

2015

pdf bib
Dependency Graph-to-String Translation
Liangyou Li | Andy Way | Qun Liu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
İIknur El‐Kahlout | Mehmed Özkan | Felipe Sánchez‐Martínez | Gema Ramírez‐Sánchez | Fred Hollywood | Andy Way
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Dependency-based Reordering Model for Constituent Pairs in Hierarchical SMT
Arefeh Kazemiy | Antonio Toral | Andy Way | Amirhassan Monadjemiy
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus
Peyman Passban | Andy Way | Qun Liu
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Translating Literary Text between Related Languages using SMT
Antonio Toral | Andy Way
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf bib
ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics
Ergun Biçici | Qun Liu | Andy Way
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Referential Translation Machines for Predicting Translation Quality and Related Statistics
Ergun Biçici | Qun Liu | Andy Way
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
İlknur Durgar El-Kahlout | Mehmed Özkan | Felipe Sánchez-Martínez | Gema Ramírez-Sánchez | Fred Hollowood | Andy Way
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Dependency-based Reordering Model for Constituent Pairs in Hierarchical SMT
Arefeh Kazemi | Antonio Toral | Andy Way | Amirhassan Monadjemi | Mohammadali Nematbakhsh
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus
Peyman Passban | Andy Way | Qun Liu
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A. Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel L. Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
RTM-DCU: Referential Translation Machines for Semantic Similarity
Ergun Biçici | Andy Way
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems
Ergun Biçici | Qun Liu | Andy Way
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The DCU-ICTCAS MT system at WMT 2014 on German-English Translation Task
Liangyou Li | Xiaofeng Wu | Santiago Cortés Vaíllo | Jun Xie | Andy Way | Qun Liu
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules
Raphael Rubino | Antonio Toral | Victor M. Sánchez-Cartagena | Jorge Ferrández-Tordera | Sergio Ortiz-Rojas | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Andy Way
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
DCU-Lingo24 Participation in WMT 2014 Hindi-English Translation task
Xiaofeng Wu | Rejwanul Haque | Tsuyoshi Okita | Piyush Arora | Andy Way | Qun Liu
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
DCU Terminology Translation System for Medical Query Subtask at WMT14
Tsuyoshi Okita | Ali Vahid | Andy Way | Qun Liu
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Referential Translation Machines for Predicting Translation Quality
Ergun Biçici | Andy Way
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Transformation and Decomposition for Efficiently Implementing and Improving Dependency-to-String Model In Moses
Liangyou Li | Jun Xie | Andy Way | Qun Liu
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation
Rejwanul Haque | Sergio Penkale | Andy Way
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)

pdf bib
Proceedings of the 17th Annual conference of the European Association for Machine Translation
Mauro Cettolo | Marcello Federico | Lucia Specia | Andy Way
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Standard language variety conversion for content localisation via SMT
Federico Fancellu | Andy Way | Morgan O’Brien
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain
Antonio Toral | Raphael Rubino | Miquel Esplà-Gomis | Tommi Pirinen | Andy Way | Gema Ramírez-Sánchez
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2012

pdf bib
Proceedings of the 16th Annual conference of the European Association for Machine Translation
Mauro Cettolo | Marcello Federico | Lucia Specia | Andy Way
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
From Subtitles to Parallel Corpora
Mark Fishel | Yota Georgakopoulou | Sergio Penkale | Volha Petukhova | Matej Rojc | Martin Volk | Andy Way
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word Reduction: Normalization and/or Supplementary Data
Pratyush Banerjee | Sudip Kumar Naskar | Johann Roturier | Andy Way | Josef van Genabith
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT
Hala Almaghout | Jie Jiang | Andy Way
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
Volha Petukhova | Rodrigo Agerri | Mark Fishel | Sergio Penkale | Arantza del Pozo | Mirjam Sepesy Maučec | Andy Way | Panayota Georgakopoulou | Martin Volk
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process. The FP7 European project SUMAT (An Online Service for SUbtitling by MAchine Translation: http://www.sumat-project.eu) aims to develop an online subtitle translation service for nine European languages, combined into 14 different language pairs, in order to semi-automate the subtitle translation processes of both freelance translators and subtitling companies on a large scale. In this paper we discuss the data collection and parallel corpus compilation for training SMT systems, which includes several procedures such as data partition, conversion, formatting, normalization and alignment. We discuss in detail each data pre-processing step using various approaches. Apart from the quantity (around 1 million subtitles per language pair), the SUMAT corpus has a number of very important characteristics. First of all, high quality both in terms of translation and in terms of high-precision alignment of parallel documents and their contents has been achieved. Secondly, the contents are provided in one consistent format and encoding. Finally, additional information such as type of content in terms of genres and domain is available.

pdf bib
Combining EBMT, SMT, TM and IR Technologies for Quality and Scale
Sandipan Dandapat | Sara Morrissey | Andy Way | Josef van Genabith
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models
Pratyush Banerjee | Sudip Kumar Naskar | Johann Roturier | Andy Way | Josef van Genabith
Proceedings of COLING 2012

2011

pdf bib
Incorporating Source-Language Paraphrases into Phrase-Based SMT with Confusion Networks
Jie Jiang | Jinhua Du | Andy Way
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
A Comparative Evaluation of Research vs. Online MT Systems
Antonio Toral | Federico Gaspari | Sudip Kumar Naskar | Andy Way
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Experiments on Domain Adaptation for Patent Machine Translation in the PLuTO project
Alexandru Ceauşu | John Tinsley | Jian Zhang | Andy Way
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Towards a User-Friendly Webservice Architecture for Statistical Machine Translation in the PANACEA project
Antonio Toral | Pavel Pecina | Marc Poch | Andy Way
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Preliminary Experiments on Using Users’ Post-Editions to Enhance a SMT System Oracle-based Training for Phrase-based Statistical Machine Translation
Ankit Srivastava | Yanjun Ma | Andy Way
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting
Sandipan Dandapat | Sara Morrissey | Andy Way | Mikel L. Forcada
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Combining Semantic and Syntactic Generalization in Example-Based Machine Translation
Sarah Ebling | Andy Way | Martin Volk | Sudip Kumar Naskar
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
CCG Contextual labels in Hierarchical Phrase-Based SMT
Hala Almaghout | Jie Jiang | Andy Way
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation
Pavel Pecina | Antonio Toral | Andy Way | Vassilis Papavassiliou | Prokopis Prokopidis | Maria Giagkou
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
Yanjun Ma | Yifan He | Andy Way | Josef van Genabith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Andy Way | Patrick Pantel
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2010

pdf bib
Statistical Analysis of Alignment Characteristics for Phrase-based Machine Translation
Patrik Lambert | Simon Petitrenaud | Yanjun Ma | Andy Way
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
TMX Markup: A Challenge When Adapting SMT to the Localisation Environment
Jinhua Du | Johann Roturier | Andy Way
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
Lattice Score Based Data Cleaning for Phrase-Based Statistical Machine Translation
Jie Jiang | Julie Carson-Berndsen | Andy Way
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
The Impact of Source–Side Syntactic Reordering on Hierarchical Phrase-based SMT
Jinhua Du | Andy Way
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
MATREX: The DCU MT System for WMT 2010
Sergio Penkale | Rejwanul Haque | Sandipan Dandapat | Pratyush Banerjee | Ankit K. Srivastava | Jinhua Du | Pavel Pecina | Sudip Kumar Naskar | Mikel L. Forcada | Andy Way
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
An Augmented Three-Pass System Combination Framework: DCU Combination System for WMT 2010
Jinhua Du | Pavel Pecina | Andy Way
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
The DCU Dependency-Based Metric in WMT-MetricsMATR 2010
Yifan He | Jinhua Du | Andy Way | Josef van Genabith
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation
Santanu Pal | Sudip Kumar Naskar | Pavel Pecina | Sivaji Bandyopadhyay | Andy Way
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Source-side Syntactic Reordering Patterns with Functional Words for Improved Phrase-based SMT
Jie Jiang | Jinhua Du | Andy Way
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf bib
HMM Word-to-Phrase Alignment with Dependency Constraints
Yanjun Ma | Andy Way
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf bib
Multi-Word Expression-Sensitive Word Alignment
Tsuyoshi Okita | Alfredo Maldonado Guerra | Yvette Graham | Andy Way
Proceedings of the 4th Workshop on Cross Lingual Information Access

pdf bib
A Discriminative Latent Variable-Based “DE” Classifier for Chinese-English SMT
Jinhua Du | Andy Way
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Integrating N-best SMT Outputs into a TM System
Yifan He | Yanjun Ma | Andy Way | Josef van Genabith
Coling 2010: Posters

pdf bib
Bridging SMT and TM with Translation Recommendation
Yifan He | Yanjun Ma | Josef van Genabith | Andy Way
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Facilitating Translation Using Source Language Paraphrase Lattices
Jinhua Du | Jie Jiang | Andy Way
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Accuracy-Based Scoring for DOT: Towards Direct Error Minimization for Data-Oriented Translation
Daniel Galron | Sergio Penkale | Andy Way | I. Dan Melamed
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Syntactified Direct Translation Model with Linear-time Decoding
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Lexicalized Semi-incremental Dependency Parsing
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the International Conference RANLP-2009

pdf bib
Capturing Lexical Variation in MT Evaluation Using Automatically Built Sense-Cluster Inventories
Marianna Apidianaki | Yifan He | Andy Way
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Dependency Relations as Source Context in Phrase-Based SMT
Rejwanul Haque | Sudip Kumar Naskar | Antal van den Bosch | Andy Way
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Experiments on Domain Adaptation for English–Hindi SMT
Rejwanul Haque | Sudip Kumar Naskar | Josef van Genabith | Andy Way
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
MATREX: The DCU MT System for WMT 2009
Jinhua Du | Yifan He | Sergio Penkale | Andy Way
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Web Service Integration for Next Generation Localisation
David Lewis | Stephen Curran | Kevin Feeney | Zohar Etzioni | John Keeney | Andy Way | Reinhard Schäler
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

pdf bib
English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009
Rejwanul Haque | Sandipan Dandapat | Ankit Kumar Srivastava | Sudip Kumar Naskar | Andy Way
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Learning Labelled Dependencies in Machine Translation Evaluation
Yifan He | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Optimal Bilingual Data for French-English PB-SMT
Sylwia Ozdowska | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Marker-Based Filtering of Bilingual Phrase Pairs for SMT
Felipe Sánchez-Martínez | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Using Supertags as Source Language Context in SMT
Rejwanul Haque | Sudip Kumar Naskar | Yanjun Ma | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Tuning Syntactically Enhanced Word Alignment for Statistical Machine Translation
Yanjun Ma | Patrik Lambert | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation
Yanjun Ma | Andy Way
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
The ATIS Sign Language Corpus
Jan Bungeroth | Daniel Stein | Philippe Dreuw | Hermann Ney | Sara Morrissey | Andy Way | Lynette van Zijl
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Systems that automatically process sign language rely on appropriate data. We therefore present the ATIS sign language corpus that is based on the domain of air travel information. It is available for five languages, English, German, Irish sign language, German sign language and South African sign language. The corpus can be used for different tasks like automatic statistical translation and automatic sign language recognition and it allows the specific modeling of spatial references in signing space.

pdf bib
Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation
Aoife Cahill | Michael Burke | Ruth O’Donovan | Stefan Riezler | Josef van Genabith | Andy Way
Computational Linguistics, Volume 34, Number 1, March 2008

pdf bib
MaTrEx: The DCU MT System for WMT 2008
John Tinsley | Yanjun Ma | Sylwia Ozdowska | Andy Way
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Improving Word Alignment Using Syntactic Dependencies
Yanjun Ma | Sylwia Ozdowska | Yanli Sun | Andy Way
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Automatic Generation of Parallel Treebanks
Ventsislav Zhechev | Andy Way
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Dependency-Based Automatic Evaluation for Machine Translation
Karolina Owczarzak | Josef van Genabith | Andy Way
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

pdf bib
Labelled Dependencies in Machine Translation Evaluation
Karolina Owczarzak | Josef van Genabith | Andy Way
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Supertagged Phrase-Based Statistical Machine Translation
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Bootstrapping Word Alignment via Word Packing
Yanjun Ma | Nicolas Stroppa | Andy Way
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Contextual Bitext-Derived Paraphrases in Automatic MT Evaluation
Karolina Owczarzak | Declan Groves | Josef Van Genabith | Andy Way
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Disambiguation Strategies for Data-Oriented Translation
Mary Hearne | Andy Way
Proceedings of the 11th Annual conference of the European Association for Machine Translation

pdf bib
Hybridity in MT. Experiments on the Europarl Corpus
Declan Groves | Andy Way
Proceedings of the 11th Annual conference of the European Association for Machine Translation

pdf bib
A Syntactic Skeleton for Statistical Machine Translation
Bart Mellebeek | Karolina Owczarzak | Declan Groves | Josef Van Genabith | Andy Way
Proceedings of the 11th Annual conference of the European Association for Machine Translation

2005

pdf bib
TransBooster: boosting the performance of wide-coverage machine translation systems
Bart Mellebeek | Anna Khasin | Josef Van Genabith | Andy Way
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks
Ruth O’Donovan | Michael Burke | Aoife Cahill | Josef van Genabith | Andy Way
Computational Linguistics, Volume 31, Number 3, September 2005

pdf bib
Hybrid Example-Based SMT: the Best of Both Worlds?
Declan Groves | Andy Way
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
Example-based controlled translation
Nano Gough | Andy Way
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf bib
Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations
Aoife Cahill | Michael Burke | Ruth O’Donovan | Josef van Genabith | Andy Way
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank
Ruth O’Donovan | Michael Burke | Aoife Cahill | Josef van Genabith | Andy Way
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar
Michael Burke | Olivia Lam | Aoife Cahill | Rowena Chan | Ruth O’Donovan | Adams Bodomo | Josef van Genabith | Andy Way
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation

pdf bib
Robust Sub-Sentential Alignment of Phrase-Structure Trees
Declan Groves | Mary Hearne | Andy Way
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web
Andy Way | Nano Gough
Computational Linguistics, Volume 29, Number 3, September 2003: Special Issue on the Web as Corpus

2002

pdf bib
Testing students’ understanding of complex transfer
Andy Way
Proceedings of the 6th EAMT Workshop: Teaching Machine Translation

2000

pdf bib
LFG-DOT: a probabilistic, constraint-based model for machine translation
Andy Way
Proceedings of the Fifth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+5)

pdf bib
LFG-DOT: Combining Constraint-Based and Empirical Methodologies for Robust MT
Andy Way
Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999)

Search
Co-authors