Martin Popel


2019

pdf bib
English-Czech Systems in WMT19: Document-Level Transformer
Martin Popel | Dominik Macháček | Michal Auersperger | Ondřej Bojar | Pavel Pecina
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We describe our NMT systems submitted to the WMT19 shared task in English→Czech news translation. Our systems are based on the Transformer model implemented in either Tensor2Tensor (T2T) or Marian framework. We aimed at improving the adequacy and coherence of translated documents by enlarging the context of the source and target. Instead of translating each sentence independently, we split the document into possibly overlapping multi-sentence segments. In case of the T2T implementation, this “document-level”-trained system achieves a +0.6 BLEU improvement (p < 0.05) relative to the same system applied on isolated sentences. To assess the potential effect document-level models might have on lexical coherence, we performed a semi-automatic analysis, which revealed only a few sentences improved in this aspect. Thus, we cannot draw any conclusions from this week evidence.

pdf bib
CUNI System for the WMT19 Robustness Task
Jindřich Helcl | Jindřich Libovický | Martin Popel
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We present our submission to the WMT19 Robustness Task. Our baseline system is the Charles University (CUNI) Transformer system trained for the WMT18 shared task on News Translation. Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

2018

pdf bib
CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Jan Hajič | Martin Popel | Martin Potthast | Milan Straka | Filip Ginter | Joakim Nivre | Slav Petrov
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2018, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on test input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. This shared task constitutes a 2nd edition—the first one took place in 2017 (Zeman et al., 2017); the main metric from 2017 has been kept, allowing for easy comparison, also in 2018, and two new main metrics have been used. New datasets added to the Universal Dependencies collection between mid-2017 and the spring of 2018 have contributed to increased difficulty of the task this year. In this overview paper, we define the task and the updated evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
CUNI Transformer Neural MT System for WMT18
Martin Popel
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We describe our NMT system submitted to the WMT2018 shared task in news translation. Our system is based on the Transformer model (Vaswani et al., 2017). We use an improved technique of backtranslation, where we iterate the process of translating monolingual data in one direction and training an NMT model for the opposite direction using synthetic parallel data. We apply a simple but effective filtering of the synthetic data. We pre-process the input sentences using coreference resolution in order to disambiguate the gender of pro-dropped personal pronouns. Finally, we apply two simple post-processing substitutions on the translated output. Our system is significantly (p < 0.05) better than all other English-Czech and Czech-English systems in WMT2018.

2017

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
Udapi: Universal API for Universal Dependencies
Martin Popel | Zdeněk Žabokrtský | Martin Vojtek
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)

2016

pdf bib
Tools and Guidelines for Principled Machine Translation Development
Nora Aranberri | Eleftherios Avramidis | Aljoscha Burchardt | Ondřej Klejch | Martin Popel | Maja Popović
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT development. Our toolkit for automatic evaluation showcases quick and detailed comparison of MT system variants through automatic metrics and n-gram feedback, along with manual evaluation via edit-distance, error annotation and task-based feedback.

pdf bib
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
Arantxa Otegi | Nora Aranberri | Antonio Branco | Jan Hajič | Martin Popel | Kiril Simov | Eneko Agirre | Petya Osenova | Rita Pereira | João Silva | Steven Neale
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.

pdf bib
TectoMT – a deep linguistic core of the combined Cimera MT system
Martin Popel | Roman Sudarikov | Ondřej Bojar | Rudolf Rosa | Jan Hajič
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

bib
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Findings of the 2016 Conference on Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Martin Popel | Matt Post | Raphael Rubino | Carolina Scarton | Lucia Specia | Marco Turchi | Karin Verspoor | Marcos Zampieri
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task
Rosa Gaudio | Gorka Labaka | Eneko Agirre | Petya Osenova | Kiril Simov | Martin Popel | Dieke Oele | Gertjan van Noord | Luís Gomes | João António Rodrigues | Steven Neale | João Silva | Andreia Querido | Nuno Rendeiro | António Branco
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Dictionary-based Domain Adaptation of MT Systems without Retraining
Rudolf Rosa | Roman Sudarikov | Michal Novák | Martin Popel | Ondřej Bojar
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Moses & Treex Hybrid MT Systems Bestiary
Rudolf Rosa | Martin Popel | Ondřej Bojar | David Mareček | Ondřej Dušek
Proceedings of the 2nd Deep Machine Translation Workshop

2015

pdf bib
Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation
Ondřej Dušek | Eva Fučíková | Jan Hajič | Martin Popel | Jana Šindlerová | Zdeňka Urešová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
New Language Pairs in TectoMT
Ondřej Dušek | Luís Gomes | Michal Novák | Martin Popel | Rudolf Rosa
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Translation Model Interpolation for Domain Adaptation in TectoMT
Rudolf Rosa | Ondřej Dušek | Michal Novák | Martin Popel
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
CUNI in WMT14: Chimera Still Awaits Bellerophon
Aleš Tamchyna | Martin Popel | Rudolf Rosa | Ondřej Bojar
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized
Rudolf Rosa | Jan Mašek | David Mareček | Martin Popel | Daniel Zeman | Zdeněk Žabokrtský
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present HamleDT 2.0 (HArmonized Multi-LanguagE Dependency Treebank). HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular in recent years. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes. We describe both of the annotation styles, including adjustments that were necessary to make, and provide details about the conversion process. We also discuss the differences between the two styles, evaluating their advantages and disadvantages, and note the effects of the differences on the conversion. We regard the stanfordization as generally successful, although we admit several shortcomings, especially in the distinction between direct and indirect objects, that have to be addressed in future. We release part of HamleDT 2.0 freely; we are not allowed to redistribute the whole dataset, but we do provide the conversion pipeline.

2013

pdf bib
PhraseFix: Statistical Post-Editing of TectoMT
Petra Galuščáková | Martin Popel | Ondřej Bojar
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Coordination Structures in Dependency Treebanks
Martin Popel | David Mareček | Jan Štěpánek | Daniel Zeman | Zdeněk Žabokrtský
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
HamleDT: To Parse or Not to Parse?
Daniel Zeman | David Mareček | Martin Popel | Loganathan Ramasamy | Jan Štěpánek | Zdeněk Žabokrtský | Jan Hajič
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We propose HamleDT ― HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. While the license terms prevent us from directly redistributing the corpora, most of them are easily acquirable for research purposes. What we provide instead is the software that normalizes tree structures in the data obtained by the user from their original providers.

pdf bib
The Joy of Parallelism with CzEng 1.0
Ondřej Bojar | Zdeněk Žabokrtský | Ondřej Dušek | Petra Galuščáková | Martin Majliš | David Mareček | Jiří Maršík | Michal Novák | Martin Popel | Aleš Tamchyna
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automatically aligned at the level of sentences as well as words. We provide not only the plain text representation, but also automatic morphological tags, surface syntactic as well as deep syntactic dependency parse trees and automatic co-reference links in both English and Czech. This paper describes key properties of the released resource including the distribution of text domains, the corpus data formats, and a toolkit to handle the provided rich annotation. We also summarize the procedure of the rich annotation (incl. co-reference resolution) and of the automatic filtering. Finally, we provide some suggestions on exploiting such an automatically annotated sentence-parallel corpus.

pdf bib
Formemes in English-Czech Deep Syntactic MT
Ondřej Dušek | Zdeněk Žabokrtský | Martin Popel | Martin Majliš | Michal Novák | David Mareček
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors
Rudolf Rosa | Ondřej Dušek | David Mareček | Martin Popel
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

2011

pdf bib
A Grain of Salt for the WMT Manual Evaluation
Ondřej Bojar | Miloš Ercegovčević | Martin Popel | Omar Zaidan
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Influence of Parser Choice on Dependency-Based MT
Martin Popel | David Mareček | Nathan Green | Zdeněk Žabokrtský
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
Maximum Entropy Translation Model in Dependency-Based MT Framework
Zdeněk Žabokrtský | Martin Popel | David Mareček
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
English-Czech MT in 2008
Ondřej Bojar | David Mareček | Václav Novák | Martin Popel | Jan Ptáček | Jan Rouš | Zdeněk Žabokrtský
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Hidden Markov Tree Model in Dependency-based Machine Translation
Zdeněk Žabokrtský | Martin Popel
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Search
Co-authors
Venues