Gholamreza Haffari


2020

pdf bib
Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
Xuanli He | Gholamreza Haffari | Mohammad Norouzi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper introduces Dynamic Programming Encoding (DPE), a new segmentation algorithm for tokenizing sentences into subword units. We view the subword segmentation of output sentences as a latent variable that should be marginalized out for learning and inference. A mixed character-subword transformer is proposed, which enables exact log marginal likelihood estimation and exact MAP inference to find target segmentations with maximum posterior probability. DPE uses a lightweight mixed character-subword transformer as a means of pre-processing parallel data to segment output sentences using dynamic programming. Empirical results on machine translation suggest that DPE is effective for segmenting output sentences and can be combined with BPE dropout for stochastic segmentation of source sentences. DPE achieves an average improvement of 0.9 BLEU over BPE (Sennrich et al., 2016) and an average improvement of 0.55 BLEU over BPE dropout (Provilkov et al., 2019) on several WMT datasets including English <=> (German, Romanian, Estonian, Finnish, Hungarian).

pdf bib
Contextual Neural Machine Translation Improves Translation of Cataphoric Pronouns
KayYen Wong | Sameen Maruf | Gholamreza Haffari
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The advent of context-aware NMT has resulted in promising improvements in the overall translation quality and specifically in the translation of discourse phenomena such as pronouns. Previous works have mainly focused on the use of past sentences as context with a focus on anaphora translation. In this work, we investigate the effect of future sentences as context by comparing the performance of a contextual NMT model trained with the future context to the one trained with the past context. Our experiments and evaluation, using generic and pronoun-focused automatic metrics, show that the use of future context not only achieves significant improvements over the context-agnostic Transformer, but also demonstrates comparable and in some cases improved performance over its counterpart trained on past context. We also perform an evaluation on a targeted cataphora test suite and report significant gains over the context-agnostic Transformer in terms of BLEU.

pdf bib
Challenge Dataset of Cognates and False Friend Pairs from Indian Languages
Diptesh Kanojia | Malhar Kulkarni | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 12th Language Resources and Evaluation Conference

Cognates are present in multiple variants of the same text across different languages (e.g., “hund” in German and “hound” in the English language mean “dog”). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine Translation, Cross-lingual Sense Disambiguation, Computational Phylogenetics, and Information Retrieval. A possible solution to address this challenge is to identify cognates across language pairs. In this paper, we describe the creation of two cognate datasets for twelve Indian languages namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. We digitize the cognate data from an Indian language cognate dictionary and utilize linked Indian language Wordnets to generate cognate sets. Additionally, we use the Wordnet data to create a False Friends’ dataset for eleven language pairs. We also evaluate the efficacy of our dataset using previously available baseline cognate detection approaches. We also perform a manual evaluation with the help of lexicographers and release the curated gold-standard dataset with this paper.

pdf bib
Scene Graph Modification Based on Natural Language Commands
Xuanli He | Quan Hung Tran | Gholamreza Haffari | Walter Chang | Zhe Lin | Trung Bui | Franck Dernoncourt | Nhan Dam
Findings of the Association for Computational Linguistics: EMNLP 2020

Structured representations like graphs and parse trees play a crucial role in many Natural Language Processing systems. In recent years, the advancements in multi-turn user interfaces necessitate the need for controlling and updating these structured representations given new sources of information. Although there have been many efforts focusing on improving the performance of the parsers that map text to graphs or parse trees, very few have explored the problem of directly manipulating these representations. In this paper, we explore the novel problem of graph modification, where the systems need to learn how to update an existing scene graph given a new user’s command. Our novel models based on graph-based sparse transformer and cross attention information fusion outperform previous systems adapted from the machine translation and graph generation literature. We further contribute our large graph modification datasets to the research community to encourage future research for this new problem.

pdf bib
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Diptesh Kanojia | Raj Dabre | Shubham Dewangan | Pushpak Bhattacharyya | Gholamreza Haffari | Malhar Kulkarni
Proceedings of the 28th International Conference on Computational Linguistics

Cognates are variants of the same lexical form across different languages; for example “fonema” in Spanish and “phoneme” in English are cognates, both of which mean “a unit of sound”. The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our approach introduces the use of context from a knowledge graph to generate improved feature representations for cognate detection. We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages, namely, Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. Additionally, we create evaluation datasets for two more Indian languages, Konkani and Nepali. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection. Furthermore, we observe that cognates extracted using our method help improve NMT quality by up to 2.76 BLEU. We also release our code, newly constructed datasets and cross-lingual models publicly.

pdf bib
Context Dependent Semantic Parsing: A Survey
Zhuang Li | Lizhen Qu | Gholamreza Haffari
Proceedings of the 28th International Conference on Computational Linguistics

Semantic parsing is the task of translating natural language utterances into machine-readable meaning representations. Currently, most semantic parsing methods are not able to utilize the contextual information (e.g. dialogue and comments history), which has a great potential to boost the semantic parsing systems. To address this issue, context dependent semantic parsing has recently drawn a lot of attention. In this survey, we investigate progress on the methods for the context dependent semantic parsing, together with the current datasets and tasks. We then point out open problems and challenges for future research in this area.

pdf bib
Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation
Fahimeh Saleh | Wray Buntine | Gholamreza Haffari
Proceedings of the 28th International Conference on Computational Linguistics

Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model trained on a high-resource language-pair and fine-tuning it on the data of the low-resource MT condition of interest. However, it is not clear generally which high-resource language-pair offers the best transfer learning for the target MT setting. Furthermore, different transferred models may have complementary semantic and/or syntactic strengths, hence using only one model may be sub-optimal. In this paper, we tackle this problem using knowledge distillation, where we propose to distill the knowledge of ensemble of teacher models to a single student model. As the quality of these teacher models varies, we propose an effective adaptive knowledge distillation approach to dynamically adjust the contribution of the teacher models during the distillation process. Experiments on transferring from a collection of six language pairs from IWSLT to five low-resource language-pairs from TED Talks demonstrate the effectiveness of our approach, achieving up to +0.9 BLEU score improvements compared to strong baselines.

pdf bib
Leveraging Discourse Rewards for Document-Level Neural Machine Translation
Inigo Jauregi Unanue | Nazanin Esmaili | Gholamreza Haffari | Massimo Piccardi
Proceedings of the 28th International Conference on Computational Linguistics

Document-level machine translation focuses on the translation of entire documents from a source to a target language. It is widely regarded as a challenging task since the translation of the individual sentences in the document needs to retain aspects of the discourse at document level. However, document-level translation models are usually not trained to explicitly ensure discourse quality. Therefore, in this paper we propose a training approach that explicitly optimizes two established discourse metrics, lexical cohesion and coherence, by using a reinforcement learning objective. Experiments over four different language pairs and three translation domains have shown that our training approach has been able to achieve more cohesive and coherent document translations than other competitive approaches, yet without compromising the faithfulness to the reference translation. In the case of the Zh-En language pair, our method has achieved an improvement of 2.46 percentage points (pp) in LC and 1.17 pp in COH over the runner-up, while at the same time improving 0.63 pp in BLEU score and 0.47 pp in F-BERT.

pdf bib
Understanding Unnatural Questions Improves Reasoning over Text
Xiaoyu Guo | Yuan-Fang Li | Gholamreza Haffari
Proceedings of the 28th International Conference on Computational Linguistics

Complex question answering (CQA) over raw text is a challenging task. A prominent approach to this task is based on the programmer-interpreter framework, where the programmer maps the question into a sequence of reasoning actions and the interpreter then executes these actions on the raw text. Learning an effective CQA model requires large amounts of human-annotated data, consisting of the ground-truth sequence of reasoning actions, which is time-consuming and expensive to collect at scale. In this paper, we address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions which are more convenient to parse. We firstly generate synthetic (question, action sequence) pairs by a data generator, and train a semantic parser that associates synthetic questions with their corresponding action sequences. To capture the diversity when applied to natural questions, we learn a projection model to map natural questions into their most similar unnatural questions for which the parser can work well. Without any natural training data, our projection model provides high-quality action sequences for the CQA task. Experimental results show that the QA model trained exclusively with synthetic data outperforms its state-of-the-art counterpart trained on human-labeled data.

pdf bib
CosMo: Conditional Seq2Seq-based Mixture Model for Zero-Shot Commonsense Question Answering
Farhad Moghimifar | Lizhen Qu | Yue Zhuo | Mahsa Baktashmotlagh | Gholamreza Haffari
Proceedings of the 28th International Conference on Computational Linguistics

Commonsense reasoning refers to the ability of evaluating a social situation and acting accordingly. Identification of the implicit causes and effects of a social context is the driving capability which can enable machines to perform commonsense reasoning. The dynamic world of social interactions requires context-dependent on-demand systems to infer such underlying information. However, current approaches in this realm lack the ability to perform commonsense reasoning upon facing an unseen situation, mostly due to incapability of identifying a diverse range of implicit social relations. Hence they fail to estimate the correct reasoning path. In this paper, we present Conditional Seq2Seq-based Mixture model (CosMo), which provides us with the capabilities of dynamic and diverse content generation. We use CosMo to generate context-dependent clauses, which form a dynamic Knowledge Graph (KG) on-the-fly for commonsense reasoning. To show the adaptability of our model to context-dependant knowledge generation, we address the task of zero-shot commonsense question answering. The empirical results indicate an improvement of up to +5.2% over the state-of-the-art models.

pdf bib
Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning
Yuncheng Hua | Yuan-Fang Li | Gholamreza Haffari | Guilin Qi | Tongtong Wu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB). However, the conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types, harboring inherently different characteristics, e.g., difficulty level. This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions. Our method quickly and effectively adapts the meta-learned programmer to new questions based on the most similar questions retrieved from the training data. The meta-learned policy is then used to learn a good programming policy, utilizing the trial trajectories and their rewards for similar questions in the support set. Our method achieves state-of-the-art performance on the CQA dataset (Saha et al., 2018) while using only five trial trajectories for the top-5 retrieved questions in each support set, and meta-training on tasks constructed from only 1% of the training set. We have released our code at https://github.com/DevinJake/MRL-CQA.

pdf bib
Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models
Thuy-Trang Vu | Dinh Phung | Gholamreza Haffari
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of randomly masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over subsets of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.

pdf bib
Personal Information Leakage Detection in Conversations
Qiongkai Xu | Lizhen Qu | Zeyu Gao | Gholamreza Haffari
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The global market size of conversational assistants (chatbots) is expected to grow to USD 9.4 billion by 2024, according to MarketsandMarkets. Despite the wide use of chatbots, leakage of personal information through chatbots poses serious privacy concerns for their users. In this work, we propose to protect personal information by warning users of detected suspicious sentences generated by conversational assistants. The detection task is formulated as an alignment optimization problem and a new dataset PERSONA-LEAKAGE is collected for evaluation. In this paper, we propose two novel constrained alignment models, which consistently outperform baseline methods on Moreover, we conduct analysis on the behavior of recently proposed personalized chit-chat dialogue systems. The empirical results show that those systems suffer more from personal information disclosure than the widely used Seq2Seq model and the language model. In those cases, a significant number of information leaking utterances can be detected by our models with high precision.

2019

pdf bib
A Pointer Network Architecture for Context-Dependent Semantic Parsing
Xuanli He | Quan Tran | Gholamreza Haffari
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
Neural Versus Non-Neural Text Simplification: A Case Study
Islam Nassar | Michelle Ananda-Rajah | Gholamreza Haffari
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
Neural Speech Translation using Lattice Transformations and Graph Networks
Daniel Beck | Trevor Cohn | Gholamreza Haffari
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Speech translation systems usually follow a pipeline approach, using word lattices as an intermediate representation. However, previous work assume access to the original transcriptions used to train the ASR system, which can limit applicability in real scenarios. In this work we propose an approach for speech translation through lattice transformations and neural models based on graph networks. Experimental results show that our approach reaches competitive performance without relying on transcriptions, while also being orders of magnitude faster than previous work.

pdf bib
Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation
Poorya Zaremoodi | Gholamreza Haffari
Proceedings of the 3rd Workshop on Neural Generation and Translation

Neural Machine Translation (NMT), a data-hungry technology, suffers from the lack of bilingual data in low-resource scenarios. Multitask learning (MTL) can alleviate this issue by injecting inductive biases into NMT, using auxiliary syntactic and semantic tasks. However, an effective training schedule is required to balance the importance of tasks to get the best use of the training signal. The role of training schedule becomes even more crucial in biased-MTL where the goal is to improve one (or a subset) of tasks the most, e.g. translation quality. Current approaches for biased-MTL are based on brittle hand-engineered heuristics that require trial and error, and should be (re-)designed for each learning scenario. To the best of our knowledge, ours is the first work on adaptively and dynamically changing the training schedule in biased-MTL. We propose a rigorous approach for automatically reweighing the training data of the main and auxiliary tasks throughout the training process based on their contributions to the generalisability of the main NMT task. Our experiments on translating from English to Vietnamese/Turkish/Spanish show improvements of up to +1.2 BLEU points, compared to strong baselines. Additionally, our analyses shed light on the dynamic of needs throughout the training of NMT: from syntax to semantic.

pdf bib
Monash University’s Submissions to the WNGT 2019 Document Translation Task
Sameen Maruf | Gholamreza Haffari
Proceedings of the 3rd Workshop on Neural Generation and Translation

We describe the work of Monash University for the shared task of Rotowire document translation organised by the 3rd Workshop on Neural Generation and Translation (WNGT 2019). We submitted systems for both directions of the English-German language pair. Our main focus is on employing an established document-level neural machine translation model for this task. We achieve a BLEU score of 39.83 (41.46 BLEU per WNGT evaluation) for En-De and 45.06 (47.39 BLEU per WNGT evaluation) for De-En translation directions on the Rotowire test set. All experiments conducted in the process are also described.

pdf bib
Learning How to Active Learn by Dreaming
Thuy-Trang Vu | Ming Liu | Dinh Phung | Gholamreza Haffari
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Heuristic-based active learning (AL) methods are limited when the data distribution of the underlying learning problems vary. Recent data-driven AL policy learning methods are also restricted to learn from closely related domains. We introduce a new sample-efficient method that learns the AL policy directly on the target domain of interest by using wake and dream cycles. Our approach interleaves between querying the annotation of the selected datapoints to update the underlying student learner and improving AL policy using simulation where the current student learner acts as an imperfect annotator. We evaluate our method on cross-domain and cross-lingual text classification and named entity recognition tasks. Experimental results show that our dream-based AL policy training strategy is more effective than applying the pretrained policy without further fine-tuning and better than the existing strong baseline methods that use heuristics or reinforcement learning.

pdf bib
Selective Attention for Context-aware Neural Machine Translation
Sameen Maruf | André F. T. Martins | Gholamreza Haffari
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document. Recent works in context-aware NMT consider only a few previous sentences as context and may not scale to entire documents. To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. We also propose single-level attention approaches based on sentence or word-level information in the context. The document-level context representation, produced from these attention modules, is integrated into the encoder or decoder of the Transformer model depending on whether we use monolingual or bilingual context. Our experiments and evaluation on English-German datasets in different document MT settings show that our selective attention approach not only significantly outperforms context-agnostic baselines but also surpasses context-aware baselines in most cases.

2018

pdf bib
Learning to Actively Learn Neural Machine Translation
Ming Liu | Wray Buntine | Gholamreza Haffari
Proceedings of the 22nd Conference on Computational Natural Language Learning

Traditional active learning (AL) methods for machine translation (MT) rely on heuristics. However, these heuristics are limited when the characteristics of the MT problem change due to e.g. the language pair or the amount of the initial bitext. In this paper, we present a framework to learn sentence selection strategies for neural MT. We train the AL query strategy using a high-resource language-pair based on AL simulations, and then transfer it to the low-resource language-pair of interest. The learned query strategy capitalizes on the shared characteristics between the language pairs to make an effective use of the AL budget. Our experiments on three language-pairs confirms that our method is more effective than strong heuristic-based methods in various conditions, including cold-start and warm-start as well as small and extremely small data conditions.

pdf bib
Sequence to Sequence Mixture Model for Diverse Machine Translation
Xuanli He | Gholamreza Haffari | Mohammad Norouzi
Proceedings of the 22nd Conference on Computational Natural Language Learning

Sequence to sequence (SEQ2SEQ) models lack diversity in their generated translations. This can be attributed to their limitations in capturing lexical and syntactic variations in parallel corpora, due to different styles, genres, topics, or ambiguity of human translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model. Each mixture component selects its own training dataset via optimization of the marginal log-likelihood, which leads to a soft clustering of the parallel corpus. Experiments on four language pairs demonstrate the superiority of our mixture model compared to SEQ2SEQ model with the standard and diversity encouraged beam search. Our mixture model incurs negligible additional parameters and no extra computation in the decoding time.

pdf bib
The Context-Dependent Additive Recurrent Neural Net
Quan Hung Tran | Tuan Lai | Gholamreza Haffari | Ingrid Zukerman | Trung Bui | Hung Bui
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Contextual sequence mapping is one of the fundamental problems in Natural Language Processing (NLP). Here, instead of relying solely on the information presented in the text, the learning agents have access to a strong external signal given to assist the learning process. In this paper, we propose a novel family of Recurrent Neural Network unit: the Context-dependent Additive Recurrent Neural Network (CARNN) that is designed specifically to address this type of problem. The experimental results on public datasets in the dialog problem (Babi dialog Task 6 and Frame), contextual language model (Switchboard and Penn Tree Bank) and question answering (Trec QA) show that our novel CARNN-based architectures outperform previous methods.

pdf bib
Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach
Poorya Zaremoodi | Gholamreza Haffari
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Neural machine translation requires large amount of parallel training text to learn a reasonable quality translation model. This is particularly inconvenient for language pairs for which enough parallel text is not available. In this paper, we use monolingual linguistic resources in the source side to address this challenging problem based on a multi-task learning approach. More specifically, we scaffold the machine translation task on auxiliary tasks including semantic parsing, syntactic parsing, and named-entity recognition. This effectively injects semantic and/or syntactic knowledge into the translation model, which would otherwise require a large amount of training bitext to learn from. We empirically analyze and show the effectiveness of our multitask learning approach on three translation tasks: English-to-French, English-to-Farsi, and English-to-Vietnamese.

pdf bib
Automatic Post-Editing of Machine Translation: A Neural Programmer-Interpreter Approach
Thuy-Trang Vu | Gholamreza Haffari
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Automated Post-Editing (PE) is the task of automatically correct common and repetitive errors found in machine translation (MT) output. In this paper, we present a neural programmer-interpreter approach to this task, resembling the way that human perform post-editing using discrete edit operations, wich we refer to as programs. Our model outperforms previous neural models for inducing PE programs on the WMT17 APE task for German-English up to +1 BLEU score and -0.7 TER scores.

pdf bib
Improved Neural Machine Translation using Side Information
Cong Duy Vu Hoang | Gholamreza Haffari | Trevor Cohn
Proceedings of the Australasian Language Technology Association Workshop 2018

In this work, we investigate whether side information is helpful in neural machine translation (NMT). We study various kinds of side information, including topical information, personal trait, then propose different ways of incorporating them into the existing NMT models. Our experimental results show the benefits of side information in improving the NMT models.

pdf bib
Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation
Xuanli He | Quan Tran | William Havard | Laurent Besacier | Ingrid Zukerman | Gholamreza Haffari
Proceedings of the Australasian Language Technology Association Workshop 2018

In spite of the recent success of Dialogue Act (DA) classification, the majority of prior works focus on text-based classification with oracle transcriptions, i.e. human transcriptions, instead of Automatic Speech Recognition (ASR)’s transcriptions. In spoken dialog systems, however, the agent would only have access to noisy ASR transcriptions, which may further suffer performance degradation due to domain shift. In this paper, we explore the effectiveness of using both acoustic and textual signals, either oracle or ASR transcriptions, and investigate speaker domain adaptation for DA classification. Our multimodal model proves to be superior to the unimodal models, particularly when the oracle transcriptions are not available. We also propose an effective method for speaker domain adaptation, which achieves competitive results.

pdf bib
Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model
Poorya Zaremoodi | Gholamreza Haffari
Proceedings of the 27th International Conference on Computational Linguistics

Incorporating syntactic information in Neural Machine Translation (NMT) can lead to better reorderings, particularly useful when the language pairs are syntactically highly divergent or when the training bitext is not large. Previous work on using syntactic information, provided by top-1 parse trees generated by (inevitably error-prone) parsers, has been promising. In this paper, we propose a forest-to-sequence NMT model to make use of exponentially many parse trees of the source sentence to compensate for the parser errors. Our method represents the collection of parse trees as a packed forest, and learns a neural transducer to translate from the input forest to the target sentence. Experiments on English to German, Chinese and Farsi translation tasks show the superiority of our approach over the sequence-to-sequence and tree-to-sequence neural translation models.

pdf bib
Graph-to-Sequence Learning using Gated Graph Neural Networks
Daniel Beck | Gholamreza Haffari | Trevor Cohn
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on graph-to-sequence obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results shows that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.

pdf bib
Document Context Neural Machine Translation with Memory Networks
Sameen Maruf | Gholamreza Haffari
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a document-level neural machine translation model which takes both source and target document context into account using memory networks. We model the problem as a structured prediction problem with interdependencies among the observed and hidden variables, i.e., the source sentences and their unobserved target translations in the document. The resulting structured prediction problem is tackled with a neural translation model equipped with two memory components, one each for the source and target side, to capture the documental interdependencies. We train the model end-to-end, and propose an iterative decoding algorithm based on block coordinate descent. Experimental results of English translations from French, German, and Estonian documents show that our model is effective in exploiting both source and target document context, and statistically significantly outperforms the previous work in terms of BLEU and METEOR.

pdf bib
Learning How to Actively Learn: A Deep Imitation Learning Approach
Ming Liu | Wray Buntine | Gholamreza Haffari
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Heuristic-based active learning (AL) methods are limited when the data distribution of the underlying learning problems vary. We introduce a method that learns an AL “policy” using “imitation learning” (IL). Our IL-based approach makes use of an efficient and effective “algorithmic expert”, which provides the policy learner with good actions in the encountered AL situations. The AL strategy is then learned with a feedforward network, mapping situations to most informative query datapoints. We evaluate our method on two different tasks: text classification and named entity recognition. Experimental results show that our IL-based AL strategy is more effective than strong previous methods using heuristics and reinforcement learning.

pdf bib
Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation
Poorya Zaremoodi | Wray Buntine | Gholamreza Haffari
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Neural Machine Translation (NMT) is notorious for its need for large amounts of bilingual data. An effective approach to compensate for this requirement is Multi-Task Learning (MTL) to leverage different linguistic resources as a source of inductive bias. Current MTL architectures are based on the Seq2Seq transduction, and (partially) share different components of the models among the tasks. However, this MTL approach often suffers from task interference and is not able to fully capture commonalities among subsets of tasks. We address this issue by extending the recurrent units with multiple “blocks” along with a trainable “routing network”. The routing network enables adaptive collaboration by dynamic sharing of blocks conditioned on the task at hand, input, and model state. Empirical evaluation of two low-resource translation tasks, English to Vietnamese and Farsi, show +1 BLEU score improvements compared to strong baselines.

pdf bib
Iterative Back-Translation for Neural Machine Translation
Vu Cong Duy Hoang | Philipp Koehn | Gholamreza Haffari | Trevor Cohn
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

We present iterative back-translation, a method for generating increasingly better synthetic parallel data from monolingual data to train neural machine translation systems. Our proposed method is very simple yet effective and highly applicable in practice. We demonstrate improvements in neural machine translation quality in both high and low resourced scenarios, including the best reported BLEU scores for the WMT 2017 German↔English tasks.

pdf bib
Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations
Sameen Maruf | André F. T. Martins | Gholamreza Haffari
Proceedings of the Third Conference on Machine Translation: Research Papers

Recent works in neural machine translation have begun to explore document translation. However, translating online multi-speaker conversations is still an open problem. In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task. To initiate an evaluation for this task, we introduce datasets extracted from Europarl v7 and OpenSubtitles2016. Our experiments on four language-pairs confirm the significance of leveraging conversation history, both in terms of BLEU and manual evaluation.

2017

pdf bib
A Generative Attentional Neural Network Model for Dialogue Act Classification
Quan Hung Tran | Gholamreza Haffari | Ingrid Zukerman
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a novel generative neural network architecture for Dialogue Act classification. Building upon the Recurrent Neural Network framework, our model incorporates a novel attentional technique and a label to label connection for sequence learning, akin to Hidden Markov Models. The experiments show that both of these innovations lead our model to outperform strong baselines for dialogue act classification on MapTask and Switchboard corpora. We further empirically analyse the effectiveness of each of the new innovations.

pdf bib
Efficient Benchmarking of NLP APIs using Multi-armed Bandits
Gholamreza Haffari | Tuan Dung Tran | Mark Carman
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Comparing NLP systems to select the best one for a task of interest, such as named entity recognition, is critical for practitioners and researchers. A rigorous approach involves setting up a hypothesis testing scenario using the performance of the systems on query documents. However, often the hypothesis testing approach needs to send a lot of document queries to the systems, which can be problematic. In this paper, we present an effective alternative based on the multi-armed bandit (MAB). We propose a hierarchical generative model to represent the uncertainty in the performance measures of the competing systems, to be used by Thompson Sampling to solve the resulting MAB. Experimental results on both synthetic and real data show that our approach requires significantly fewer queries compared to the standard benchmarking technique to identify the best system according to F-measure.

pdf bib
A Hierarchical Neural Model for Learning Sequences of Dialogue Acts
Quan Hung Tran | Ingrid Zukerman | Gholamreza Haffari
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We propose a novel hierarchical Recurrent Neural Network (RNN) for learning sequences of Dialogue Acts (DAs). The input in this task is a sequence of utterances (i.e., conversational contributions) comprising a sequence of tokens, and the output is a sequence of DA labels (one label per utterance). Our model leverages the hierarchical nature of dialogue data by using two nested RNNs that capture long-range dependencies at the dialogue level and the utterance level. This model is combined with an attention mechanism that focuses on salient tokens in utterances. Our experimental results show that our model outperforms strong baselines on two popular datasets, Switchboard and MapTask; and our detailed empirical analysis highlights the impact of each aspect of our model.

pdf bib
Proceedings of the Australasian Language Technology Association Workshop 2017
Jojo Sze-Meng Wong | Gholamreza Haffari
Proceedings of the Australasian Language Technology Association Workshop 2017

pdf bib
Leveraging Linguistic Resources for Improving Neural Text Classification
Ming Liu | Gholamreza Haffari | Wray Buntine | Michelle Ananda-Rajah
Proceedings of the Australasian Language Technology Association Workshop 2017

pdf bib
Persian-Spanish Low-Resource Statistical Machine Translation Through English as Pivot Language
Benyamin Ahmadnia | Javier Serrano | Gholamreza Haffari
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper is an attempt to exclusively focus on investigating the pivot language technique in which a bridging language is utilized to increase the quality of the Persian-Spanish low-resource Statistical Machine Translation (SMT). In this case, English is used as the bridging language, and the Persian-English SMT is combined with the English-Spanish one, where the relatively large corpora of each may be used in support of the Persian-Spanish pairing. Our results indicate that the pivot language technique outperforms the direct SMT processes currently in use between Persian and Spanish. Furthermore, we investigate the sentence translation pivot strategy and the phrase translation in turn, and demonstrate that, in the context of the Persian-Spanish SMT system, the phrase-level pivoting outperforms the sentence-level pivoting. Finally we suggest a method called combination model in which the standard direct model and the best triangulation pivoting model are blended in order to reach a high-quality translation.

pdf bib
Word Representation Models for Morphologically Rich Languages in Neural Machine Translation
Ekaterina Vylomova | Trevor Cohn | Xuanli He | Gholamreza Haffari
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Out-of-vocabulary words present a great challenge for Machine Translation. Recently various character-level compositional models were proposed to address this issue. In current research we incorporate two most popular neural architectures, namely LSTM and CNN, into hard- and soft-attentional models of translation for character-level representation of the source. We propose semantic and morphological intrinsic evaluation of encoder-level representations. Our analysis of the learned representations reveals that character-based LSTM seems to be better at capturing morphological aspects compared to character-based CNN. We also show that hard-attentional model provides better character-level representations compared to vanilla one.

pdf bib
Towards Decoding as Continuous Optimisation in Neural Machine Translation
Cong Duy Vu Hoang | Gholamreza Haffari | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a novel decoding approach for neural machine translation (NMT) based on continuous optimisation. We reformulate decoding, a discrete optimization problem, into a continuous problem, such that optimization can make use of efficient gradient-based techniques. Our powerful decoding framework allows for more accurate decoding for standard neural machine translation models, as well as enabling decoding in intractable models such as intersection of several different NMT models. Our empirical results show that our decoding framework is effective, and can leads to substantial improvements in translations, especially in situations where greedy search and beam search are not feasible. Finally, we show how the technique is highly competitive with, and complementary to, reranking.

pdf bib
Preserving Distributional Information in Dialogue Act Classification
Quan Hung Tran | Ingrid Zukerman | Gholamreza Haffari
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper introduces a novel training/decoding strategy for sequence labeling. Instead of greedily choosing a label at each time step, and using it for the next prediction, we retain the probability distribution over the current label, and pass this distribution to the next prediction. This approach allows us to avoid the effect of label bias and error propagation in sequence learning/decoding. Our experiments on dialogue act classification demonstrate the effectiveness of this approach. Even though our underlying neural network model is relatively simple, it outperforms more complex neural models, achieving state-of-the-art results on the MapTask and Switchboard corpora.

2016

pdf bib
Richer Interpolative Smoothing Based on Modified Kneser-Ney Language Modeling
Ehsan Shareghi | Trevor Cohn | Gholamreza Haffari
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Latent Variable Recurrent Neural Network for Discourse-Driven Language Models
Yangfeng Ji | Gholamreza Haffari | Jacob Eisenstein
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Inter-document Contextual Language model
Quan Hung Tran | Ingrid Zukerman | Gholamreza Haffari
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
Trevor Cohn | Cong Duy Vu Hoang | Ekaterina Vymolova | Kaisheng Yao | Chris Dyer | Gholamreza Haffari
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Incorporating Side Information into Recurrent Neural Network Language Models
Cong Duy Vu Hoang | Trevor Cohn | Gholamreza Haffari
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving Word Alignment of Rare Words with Word Embeddings
Masoud Jalili Sabet | Heshaam Faili | Gholamreza Haffari
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We address the problem of inducing word alignment for language pairs by developing an unsupervised model with the capability of getting applied to other generative alignment models. We approach the task by: i)proposing a new alignment model based on the IBM alignment model 1 that uses vector representation of words, and ii)examining the use of similar source words to overcome the problem of rare source words and improving the alignments. We apply our method to English-French corpora and run the experiments with different sizes of sentence pairs. Our results show competitive performance against the baseline and in some cases improve the results up to 6.9% in terms of precision.

pdf bib
Learning cascaded latent variable models for biomedical text classification
Ming Liu | Gholamreza Haffari | Wray Buntine
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf bib
Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees
Ehsan Shareghi | Matthias Petri | Gholamreza Haffari | Trevor Cohn
Transactions of the Association for Computational Linguistics, Volume 4

Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500×, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).

2015

pdf bib
Compact, Efficient and Unlimited Capacity: Language Modeling with Compressed Suffix Trees
Ehsan Shareghi | Matthias Petri | Gholamreza Haffari | Trevor Cohn
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automated Analysis of Bangla Poetry for Classification and Poet Identification
Geetanjali Rakshit | Anupam Ghosh | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Optimizing Multivariate Performance Measures for Learning Relation Extraction Models
Gholamreza Haffari | Ajay Nagesh | Ganesh Ramakrishnan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Noisy Or-based model for Relation Extraction using Distant Supervision
Ajay Nagesh | Gholamreza Haffari | Ganesh Ramakrishnan
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Scalable Variational Inference for Extracting Hierarchical Phrase-based Translation Rules
Baskaran Sankaran | Gholamreza Haffari | Anoop Sarkar
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
Kashyap Popat | Balamurali A.R | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
An Infinite Hierarchical Bayesian Model of Phrasal Translation
Trevor Cohn | Gholamreza Haffari
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Bayesian Extraction of Minimal SCFG Rules for Hierarchical Phrase-based Translation
Baskaran Sankaran | Gholamreza Haffari | Anoop Sarkar
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing
Gholamreza Haffari | Marzieh Razavi | Anoop Sarkar
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf bib
Active Learning for Multilingual Statistical Machine Translation
Gholamreza Haffari | Anoop Sarkar
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Hierarchical Dirichlet Trees for Information Retrieval
Gholamreza Haffari | Yee Whye Teh
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Active Learning for Statistical Phrase-based Machine Translation
Gholamreza Haffari | Maxim Roy | Anoop Sarkar
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Homotopy-Based Semi-Supervised Hidden Markov Models for Sequence Labeling
Gholamreza Haffari | Anoop Sarkar
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Transductive learning for statistical machine translation
Nicola Ueffing | Gholamreza Haffari | Anoop Sarkar
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Tutorial on Inductive Semi-supervised Learning Methods: with Applicability to Natural Language Processing
Anoop Sarkar | Gholamreza Haffari
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts