Ondřej Dušek


2020

pdf bib
Fact-based Content Weighting for Evaluating Abstractive Summarisation
Xinnuo Xu | Ondřej Dušek | Jingyi Li | Verena Rieser | Ioannis Konstas
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Abstractive summarisation is notoriously hard to evaluate since standard word-overlap-based metrics are insufficient. We introduce a new evaluation metric which is based on fact-level content weighting, i.e. relating the facts of the document to the facts of the summary. We fol- low the assumption that a good summary will reflect all relevant facts, i.e. the ones present in the ground truth (human-generated refer- ence summary). We confirm this hypothe- sis by showing that our weightings are highly correlated to human perception and compare favourably to the recent manual highlight- based metric of Hardy et al. (2019).

pdf bib
Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task
Jindřich Libovický | Zdeněk Kasner | Jindřich Helcl | Ondřej Dušek
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present our submission to the Simultaneous Translation And Paraphrase for Language Education (STAPLE) challenge. We used a standard Transformer model for translation, with a crosslingual classifier predicting correct translations on the output n-best list. To increase the diversity of the outputs, we used additional data to train the translation model, and we trained a paraphrasing model based on the Levenshtein Transformer architecture to generate further synonymous translations. The paraphrasing results were again filtered using our classifier. While the use of additional data and our classifier filter were able to improve results, the paraphrasing model produced too many invalid outputs to further improve the output quality. Our model without the paraphrasing component finished in the middle of the field for the shared task, improving over the best baseline by a margin of 10-22 % weighted F1 absolute.

pdf bib
Data-to-Text Generation with Iterative Text Editing
Zdeněk Kasner | Ondřej Dušek
Proceedings of the 13th International Conference on Natural Language Generation

We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.

pdf bib
Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference
Ondřej Dušek | Zdeněk Kasner
Proceedings of the 13th International Conference on Natural Language Generation

A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i.e. checking if the output text contains all and only facts supported by the input data. We propose a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI). We use the NLI model to check textual entailment between the input data and the output text in both directions, allowing us to reveal omissions or hallucinations. Input data are converted to text for NLI using trivial templates. Our experiments on two recent D2T datasets show that our metric can achieve high accuracy in identifying erroneous system outputs.

2019

pdf bib
User Evaluation of a Multi-dimensional Statistical Dialogue System
Simon Keizer | Ondřej Dušek | Xingkun Liu | Verena Rieser
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

We present the first complete spoken dialogue system driven by a multiimensional statistical dialogue manager. This framework has been shown to substantially reduce data needs by leveraging domain-independent dimensions, such as social obligations or feedback, which (as we show) can be transferred between domains. In this paper, we conduct a user study and show that the performance of a multi-dimensional system, which can be adapted from a source domain, is equivalent to that of a one-dimensional baseline, which can only be trained from scratch.

pdf bib
Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)
Ondřej Dušek | Karin Sevegnani | Ioannis Konstas | Verena Rieser
Proceedings of the 12th International Conference on Natural Language Generation

We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs. The latter is trained using pairwise hinge loss over scores from two copies of the rating network. We use learning to rank and synthetic data to improve the quality of ratings assigned by our system: We synthesise training pairs of distorted system outputs and train the system to rank the less distorted one higher. This leads to a 12% increase in correlation with human ratings over the previous benchmark. We also establish the state of the art on the dataset of relative rankings from the E2E NLG Challenge (Dusek et al., 2019), where synthetic data lead to a 4% accuracy increase over the base model.

pdf bib
Semantic Noise Matters for Neural Natural Language Generation
Ondřej Dušek | David M. Howcroft | Verena Rieser
Proceedings of the 12th International Conference on Natural Language Generation

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

pdf bib
Neural Generation for Czech: Data and Baselines
Ondřej Dušek | Filip Jurčíček
Proceedings of the 12th International Conference on Natural Language Generation

We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach. While non-English NLG is under-explored in general, Czech, as a morphologically rich language, makes the task even harder: Since Czech requires inflecting named entities, delexicalization or copy mechanisms do not work out-of-the-box and lexicalizing the generated outputs is non-trivial. In our experiments, we present two different approaches to this this problem: (1) using a neural language model to select the correct inflected form while lexicalizing, (2) a two-step generation setup: our sequence-to-sequence model generates an interleaved sequence of lemmas and morphological tags, which are then inflected by a morphological generator.

2018

pdf bib
RankME: Reliable Human Ratings for Natural Language Generation
Jekaterina Novikova | Ondřej Dušek | Verena Rieser
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments. We show that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods. In addition, we show that it is possible to evaluate NLG systems according to multiple, distinct criteria, which is important for error analysis. Finally, we demonstrate that RankME, in combination with Bayesian estimation of system quality, is a cost-effective alternative for ranking multiple NLG systems.

pdf bib
Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity
Xinnuo Xu | Ondřej Dušek | Ioannis Konstas | Verena Rieser
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity. Experiments on the OpenSubtitles corpus show a substantial improvement over competitive neural models in terms of BLEU score as well as metrics of coherence and diversity.

pdf bib
Neural Response Ranking for Social Conversation: A Data-Efficient Approach
Igor Shalyminov | Ondřej Dušek | Oliver Lemon
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

The overall objective of ‘social’ dialogue systems is to support engaging, entertaining, and lengthy conversations on a wide variety of topics, including social chit-chat. Apart from raw dialogue data, user-provided ratings are the most common signal used to train such systems to produce engaging responses. In this paper we show that social dialogue systems can be trained effectively from raw unannotated data. Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting ‘good’ system responses to user utterances, i.e. responses which are likely to lead to long and engaging conversations. We show that (1) our neural ranker consistently outperforms several strong baselines when trained to optimise for user ratings; (2) when trained on larger amounts of data and only using conversation length as the objective, the ranker performs better than the one trained using ratings – ultimately reaching a Precision@1 of 0.87. This advance will make data collection for social conversational agents simpler and less expensive in the future.

pdf bib
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Shubham Agarwal | Ondřej Dušek | Ioannis Konstas | Verena Rieser
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB).

pdf bib
Improving Context Modelling in Multimodal Dialogue Generation
Shubham Agarwal | Ondřej Dušek | Ioannis Konstas | Verena Rieser
Proceedings of the 11th International Conference on Natural Language Generation

In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system’s output.

pdf bib
Findings of the E2E NLG Challenge
Ondřej Dušek | Jekaterina Novikova | Verena Rieser
Proceedings of the 11th International Conference on Natural Language Generation

This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG shared task aims to assess whether these novel approaches can generate better-quality output by learning from a dataset containing higher lexical richness, syntactic complexity and diverse discourse phenomena. We compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures – with the majority implementing sequence-to-sequence models (seq2seq) – as well as systems based on grammatical rules and templates.

2017

pdf bib
The E2E Dataset: New Challenges For End-to-End Generation
Jekaterina Novikova | Ondřej Dušek | Verena Rieser
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.

pdf bib
Why We Need New Evaluation Metrics for NLG
Jekaterina Novikova | Ondřej Dušek | Amanda Cercas Curry | Verena Rieser
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG. We also show that metric performance is data- and system-specific. Nevertheless, our results also suggest that automatic metrics perform reliably at system-level and can support system development by finding cases where a system performs poorly.

2016

pdf bib
Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings
Ondřej Dušek | Filip Jurčíček
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
A Context-aware Natural Language Generator for Dialogue Systems
Ondřej Dušek | Filip Jurčíček
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Verb sense disambiguation in Machine Translation
Roman Sudarikov | Ondřej Dušek | Martin Holub | Ondřej Bojar | Vincent Kríž
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

We describe experiments in Machine Translation using word sense disambiguation (WSD) information. This work focuses on WSD in verbs, based on two different approaches – verbal patterns based on corpus pattern analysis and verbal word senses from valency frames. We evaluate several options of using verb senses in the source-language sentences as an additional factor for the Moses statistical machine translation system. Our results show a statistically significant translation quality improvement in terms of the BLEU metric for the valency frames approach, but in manual evaluation, both WSD methods bring improvements.

pdf bib
Moses & Treex Hybrid MT Systems Bestiary
Rudolf Rosa | Martin Popel | Ondřej Bojar | David Mareček | Ondřej Dušek
Proceedings of the 2nd Deep Machine Translation Workshop

2015

pdf bib
Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus
Zdeňka Urešová | Ondřej Dušek | Eva Fučíková | Jan Hajič | Jana Šindlerová
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation
Ondřej Dušek | Eva Fučíková | Jan Hajič | Martin Popel | Jana Šindlerová | Zdeňka Urešová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
New Language Pairs in TectoMT
Ondřej Dušek | Luís Gomes | Michal Novák | Martin Popel | Rudolf Rosa
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Translation Model Interpolation for Domain Adaptation in TectoMT
Rudolf Rosa | Ondřej Dušek | Michal Novák | Martin Popel
Proceedings of the 1st Deep Machine Translation Workshop

pdf bib
Training a Natural Language Generator From Unaligned Data
Ondřej Dušek | Filip Jurčíček
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Verbal Valency Frame Detection and Selection in Czech and English
Ondřej Dušek | Jan Hajič | Zdeňka Urešová
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib
Machine Translation of Medical Texts in the Khresmoi Project
Ondřej Dušek | Jan Hajič | Jaroslava Hlaváčová | Michal Novák | Pavel Pecina | Rudolf Rosa | Aleš Tamchyna | Zdeňka Urešová | Daniel Zeman
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Alex: Bootstrapping a Spoken Dialogue System for a New Domain by Real Users
Ondřej Dušek | Ondřej Plátek | Lukáš Žilka | Filip Jurčíček
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf bib
Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license
Matěj Korvas | Ondřej Plátek | Ondřej Dušek | Lukáš Žilka | Filip Jurčíček
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition (ASR) in spoken dialogue systems (SDSs). The data comprise 45 hours of speech in English and over 18 hours in Czech. Large part of the data, both audio and transcriptions, was collected using crowdsourcing, the rest are transcriptions by hired transcribers. We release the data together with scripts for data pre-processing and building acoustic models using the HTK and Kaldi ASR toolkits. We publish also the trained models described in this paper. The data are released under the CC-BY-SA~3.0 license, the scripts are licensed under Apache~2.0. In the paper, we report on the methodology of collecting the data, on the size and properties of the data, and on the scripts and their use. We verify the usability of the datasets by training and evaluating acoustic models using the presented data and scripts.

pdf bib
Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain
Zdeňka Urešová | Jan Hajič | Pavel Pecina | Ondřej Dušek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents development and test sets for machine translation of search queries in cross-lingual information retrieval in the medical domain. The data consists of the total of 1,508 real user queries in English translated to Czech, German, and French. We describe the translation and review process involving medical professionals and present a baseline experiment where our data sets are used for tuning and evaluation of a machine translation system.

2013

pdf bib
Robust multilingual statistical morphological generation models
Ondřej Dušek | Filip Jurčíček
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

2012

pdf bib
The Joy of Parallelism with CzEng 1.0
Ondřej Bojar | Zdeněk Žabokrtský | Ondřej Dušek | Petra Galuščáková | Martin Majliš | David Mareček | Jiří Maršík | Michal Novák | Martin Popel | Aleš Tamchyna
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automatically aligned at the level of sentences as well as words. We provide not only the plain text representation, but also automatic morphological tags, surface syntactic as well as deep syntactic dependency parse trees and automatic co-reference links in both English and Czech. This paper describes key properties of the released resource including the distribution of text domains, the corpus data formats, and a toolkit to handle the provided rich annotation. We also summarize the procedure of the rich annotation (incl. co-reference resolution) and of the automatic filtering. Finally, we provide some suggestions on exploiting such an automatically annotated sentence-parallel corpus.

pdf bib
Formemes in English-Czech Deep Syntactic MT
Ondřej Dušek | Zdeněk Žabokrtský | Martin Popel | Martin Majliš | Michal Novák | David Mareček
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
DEPFIX: A System for Automatic Correction of Czech MT Outputs
Rudolf Rosa | David Mareček | Ondřej Dušek
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors
Rudolf Rosa | Ondřej Dušek | David Mareček | Martin Popel
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation