Bogdan Babych


2019

pdf bib
Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms using Inflection Tables and Unannotated Corpora
Bogdan Babych
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

The paper presents an unsupervised method for quickly extending a Ukrainian lexicon by generating paradigms and morphological feature structures for new Named Entities and neologisms, which are not covered by existing static morphological resources. This approach addresses a practical problem of modelling paradigms for entities created by the dynamic processes in the lexicon: this problem is especially serious for highly-inflected languages in domains with specialised or quickly changing lexicon. The method uses an unannotated Ukrainian corpus and a small fixed set of inflection tables, which can be found in traditional grammar textbooks. The advantage of the proposed approach is that updating the morphological lexicon does not require training or linguistic annotation, allowing fast knowledge-light extension of an existing static lexicon to improve morphological coverage on a specific corpus. The method is implemented in an open-source package on a GitHub repository. It can be applied to other low-resourced inflectional languages which have internet corpora and linguistic descriptions of their inflection system, following the example of inflection tables for Ukrainian. Evaluation results shows consistent improvements in coverage for Ukrainian corpora of different corpus types.

2016

pdf bib
MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
Yu Yuan | Serge Sharoff | Bogdan Babych
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we introduce MoBiL, a hybrid Monolingual, Bilingual and Language modelling feature set and feature selection and evaluation framework. The set includes translation quality indicators that can be utilized to automatically predict the quality of human translations in terms of content adequacy and language fluency. We compare MoBiL with the QuEst baseline set by using them in classifiers trained with support vector machine and relevance vector machine learning algorithms on the same data set. We also report an experiment on feature selection to opt for fewer but more informative features from MoBiL. Our experiments show that classifiers trained on our feature set perform consistently better in predicting both adequacy and fluency than the classifiers trained on the baseline feature set. MoBiL also performs well when used with both support vector machine and relevance vector machine algorithms.

pdf bib
Graphonological Levenshtein Edit Distance: Application for Automated Cognate Identification
Bogdan Babych
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)
Patrik Lambert | Bogdan Babych | Kurt Eberle | Rafael E. Banchs | Reinhard Rapp | Marta R. Costa-jussà
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

2015

pdf bib
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)
Bogdan Babych | Kurt Eberle | Patrik Lambert | Reinhard Rapp | Rafael E. Banchs | Marta R. Costa-jussà
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)

2014

pdf bib
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Shuly Wintner | Marko Tadić | Bogdan Babych
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)
Rafael E. Banchs | Marta R. Costa-jussà | Reinhard Rapp | Patrik Lambert | Kurt Eberle | Bogdan Babych
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Deriving de/het gender classification for Dutch nouns for rule-based MT generation tasks
Bogdan Babych | Jonathan Geiger | Mireia Ginestí Rosell | Kurt Eberle
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

2013

pdf bib
Proceedings of the Second Workshop on Hybrid Approaches to Translation
Marta Ruiz Costa-jussà | Reinhard Rapp | Patrik Lambert | Kurt Eberle | Rafael E. Banchs | Bogdan Babych
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
Workshop on Hybrid Approaches to Translation: Overview and Developments
Marta R. Costa-jussà | Rafael Banchs | Reinhard Rapp | Patrik Lambert | Kurt Eberle | Bogdan Babych
Proceedings of the Second Workshop on Hybrid Approaches to Translation

2012

pdf bib
ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
Mārcis Pinnis | Radu Ion | Dan Ştefănescu | Fangzhong Su | Inguna Skadiņa | Andrejs Vasiļjevs | Bogdan Babych
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Development and Application of a Cross-language Document Comparability Metric
Fangzhong Su | Bogdan Babych
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present a metric that measures comparability of documents across different languages. The metric is developed within the FP7 ICT ACCURAT project, as a tool for aligning comparable corpora on the document level; further these aligned comparable documents are used for phrase alignment and extraction of translation equivalents, with the aim to extend phrase tables of statistical MT systems without the need to use parallel texts. The metric uses several features, such as lexical information, document structure, keywords and named entities, which are combined in an ensemble manner. We present the results by measuring the reliability and effectiveness of the metric, and demonstrate its application and the impact for the task of parallel phrase extraction from comparable corpora.

pdf bib
Identifying Word Translations from Comparable Documents Without a Seed Lexicon
Reinhard Rapp | Serge Sharoff | Bogdan Babych
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The extraction of dictionaries from parallel text corpora is an established technique. However, as parallel corpora are a scarce resource, in recent years the extraction of dictionaries using comparable corpora has obtained increasing attention. In order to find a mapping between languages, almost all approaches suggested in the literature rely on a seed lexicon. The work described here achieves competitive results without requiring such a seed lexicon. Instead it presupposes mappings between comparable documents in different languages. For some common types of textual resources (e.g. encyclopedias or newspaper texts) such mappings are either readily available or can be established relatively easily. The current work is based on Wikipedias where the mappings between languages are determined by the authors of the articles. We describe a neural-network inspired algorithm which first characterizes each Wikipedia article by a number of keywords, and then considers the identification of word translations as a variant of word alignment in a noisy environment. We present results and evaluations for eight language pairs involving Germanic, Romanic, and Slavic languages as well as Chinese.

pdf bib
Collecting and Using Comparable Corpora for Statistical Machine Translation
Inguna Skadiņa | Ahmet Aker | Nikos Mastropavlos | Fangzhong Su | Dan Tufis | Mateja Verlic | Andrejs Vasiļjevs | Bogdan Babych | Paul Clough | Robert Gaizauskas | Nikos Glaros | Monica Lestari Paramita | Mārcis Pinnis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Lack of sufficient parallel data for many languages and domains is currently one of the major obstacles to further advancement of automated translation. The ACCURAT project is addressing this issue by researching methods how to improve machine translation systems by using comparable corpora. In this paper we present tools and techniques developed in the ACCURAT project that allow additional data needed for statistical machine translation to be extracted from comparable corpora. We present methods and tools for acquisition of comparable corpora from the Web and other sources, for evaluation of the comparability of collected corpora, for multi-level alignment of comparable corpora and for extraction of lexical and terminological data for machine translation. Finally, we present initial evaluation results on the utility of collected corpora in domain-adapted machine translation and real-life applications.

pdf bib
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Marta R. Costa-jussà | Patrik Lambert | Rafael E. Banchs | Reinhard Rapp | Bogdan Babych
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Measuring Comparability of Documents in Non-Parallel Corpora for Efficient Extraction of (Semi-)Parallel Translation Equivalents
Fangzhong Su | Bogdan Babych
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Design of a hybrid high quality machine translation system
Bogdan Babych | Kurt Eberle | Johanna Geiß | Mireia Ginestí-Rosell | Anthony Hartley | Reinhard Rapp | Serge Sharoff | Martin Thomas
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

2009

pdf bib
Evaluation-Guided Pre-Editing of Source Text: Improving MT-Tractability of Light Verb Constructions
Bogdan Babych | Anthony Hartley | Serge Sharoff
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2008

pdf bib
Generalising Lexical Translation Strategies for MT Using Comparable Corpora
Bogdan Babych | Serge Sharoff | Anthony Hartley
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report on an on-going research project aimed at increasing the range of translation equivalents which can be automatically discovered by MT systems. The methodology is based on semi-supervised learning of indirect translation strategies from large comparable corpora and applying them in run-time to generate novel, previously unseen translation equivalents. This approach is different from methods based on parallel resources, which currently can reuse only individual translation equivalents. Instead it models translation strategies which generalise individual equivalents and can successfully generate an open class of new translation solutions. The task of the project is integration of the developed technology into open-source MT systems.

pdf bib
Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output: BLEU vs Task-Based Evaluation Methods
Bogdan Babych | Anthony Hartley
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report the results of our experiment on assessing the ability of automated MT evaluation metrics to remain sensitive to variations in MT quality as the average quality of the compared systems goes up. We compare two groups of metrics: those, which measure the proximity of MT output to some reference translation, and those which evaluate the performance of some automated process on degraded MT output. The experiment shows that proximity-based metrics (such as BLEU) loose sensitivity as the scores go up, but performance-based metrics (e.g., Named Entity recognition from MT output) remain sensitive across the scale. We suggest a model for explaining this result, which attributes stable sensitivity of performance-based metrics to measuring cumulative functional effect of different language levels, while proximity-based metrics measure structural matches on a lexical level and therefore miss higher-level errors that are more typical for better MT systems. Development of new automated metrics should take into account possible decline in sensitivity on higher-quality MT, which should be tested as part of meta-evaluation of the metrics.

2007

pdf bib
Assisting Translators in Indirect Lexical Transfer
Bogdan Babych | Anthony Hartley | Serge Sharoff | Olga Mudraya
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Using Comparable Corpora to Solve Problems Difficult for Human Translators
Serge Sharoff | Bogdan Babych | Anthony Hartley
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Using collocations from comparable corpora to find translation equivalents
Serge Sharoff | Bogdan Babych | Anthony Hartley
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present a tool for finding appropriate translation equivalents for words from the general lexicon using comparable corpora. For a phrase in the source language the tool suggests arange of possible expressions used in similar contexts in target language corpora. In the paper we discuss the method and present results of human evaluation of the performance of the tool.

pdf bib
ASSIST: Automated Semantic Assistance for Translators
Serge Sharoff | Bogdan Babych | Paul Rayson | Olga Mudraya | Scott Piao
Demonstrations

2004

pdf bib
Disambiguating translation strategies in MT using automatic named entity recognition
Bogdan Babych | Anthony Hartley
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf bib
Extending the BLEU MT Evaluation Method with Frequency Weightings
Bogdan Babych | Tony Hartley
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Extending MT evaluation tools with translation complexity metrics
Bogdan Babych | Debbie Elliott | Anthony Hartley
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Calibrating Resource-light Automatic MT Evaluation: a Cheap Approach to Ranking MT Systems by the Usability of Their Output
Bogdan Babych | Debbie Elliott | Anthony Hartley
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality
Bogdan Babych | Anthony Hartley
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Improving Machine Translation Quality with Automatic Named Entity Recognition
Bogdan Babych | Anthony Hartley
Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003