Hany Hassan

Also published as: H. Hassan


pdf bib
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf | Hany Hassan | Hal Daumé III
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present META-MT, a meta-learning approach to adapt Neural Machine Translation (NMT) systems in a few-shot setting. META-MT provides a new approach to make NMT models easily adaptable to many target do- mains with the minimal amount of in-domain data. We frame the adaptation of NMT systems as a meta-learning problem, where we learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. We evaluate the proposed meta-learning strategy on ten domains with general large scale NMT systems. We show that META-MT significantly outperforms classical domain adaptation when very few in- domain examples are available. Our experiments shows that META-MT can outperform classical fine-tuning by up to 2.5 BLEU points after seeing only 4, 000 translated words (300 parallel sentences).

pdf bib
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim | Hany Hassan
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are not efficient at inference-time compared to traditional approaches. In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time performance for Transformer-based models on various NLU tasks. We show how carefully utilizing knowledge distillation, structured pruning and numerical optimization can lead to drastic improvements on inference efficiency. We provide effective recipes that can guide practitioners to choose the best settings for various NLU tasks and pretrained models. Applying the proposed recipes to the SuperGLUE benchmark, we achieve from 9.8x up to 233.9x speed-up compared to out-of-the-box models on CPU. On GPU, we also achieve up to 12.4x speed-up with the presented methods. We show that FastFormers can drastically reduce cost of serving 100 million requests from 4,223 USD to just 18 USD on an Azure F16s_v2 instance. This translates to a sustainable runtime by reducing energy consumption 6.9x - 125.8x according to the metrics used in the SustaiNLP 2020 shared task.

pdf bib
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang | ChengXiang Zhai | Hany Hassan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages with large margin, achieving significantly better results than the individual bilingual models. We also demonstrate the efficacy of the proposed approach in the zero-shot setup for language pairs without bitext training data. Furthermore, we show the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks; the proposed approach outperforms massive scale models trained on single task.


pdf bib
From Research to Production and Back: Ludicrously Fast Neural Machine Translation
Young Jin Kim | Marcin Junczys-Dowmunt | Hany Hassan | Alham Fikri Aji | Kenneth Heafield | Roman Grundkiewicz | Nikolay Bogoychev
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper describes the submissions of the “Marian” team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year’s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year’s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively.

pdf bib
Selecting, Planning, and Rewriting: A Modular Approach for Data-to-Document Generation and Translation
Lesly Miculicich | Marc Marone | Hany Hassan
Proceedings of the 3rd Workshop on Neural Generation and Translation

In this paper, we report our system submissions to all 6 tracks of the WNGT 2019 shared task on Document-Level Generation and Translation. The objective is to generate a textual document from either structured data: generation task, or a document in a different language: translation task. For the translation task, we focused on adapting a large scale system trained on WMT data by fine tuning it on the RotoWire data. For the generation task, we participated with two systems based on a selection and planning model followed by (a) a simple language model generation, and (b) a GPT-2 pre-trained language model approach. The selection and planning module chooses a subset of table records in order, and the language models produce text given such a subset.

pdf bib
Multi-Source Cross-Lingual Model Transfer: Learning What to Share
Xilun Chen | Ahmed Hassan Awadallah | Hany Hassan | Wei Wang | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

pdf bib
Morphology-aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation
Ahmed Tawfik | Mahitab Emam | Khaled Essam | Robert Nabil | Hany Hassan
Proceedings of the Fourth Arabic Natural Language Processing Workshop

Parallel corpora available for building machine translation (MT) models for dialectal Arabic (DA) are rather limited. The scarcity of resources has prompted the use of Modern Standard Arabic (MSA) abundant resources to complement the limited dialectal resource. However, dialectal clitics often differ between MSA and DA. This paper compares morphology-aware DA word segmentation to other word segmentation approaches like Byte Pair Encoding (BPE) and Sub-word Regularization (SR). A set of experiments conducted on Egyptian Arabic (EA), Levantine Arabic (LA), and Gulf Arabic (GA) show that a sufficiently accurate morphology-aware segmentation used in conjunction with BPE outperforms the other word segmentation approaches.


pdf bib
Universal Neural Machine Translation for Extremely Low Resource Languages
Jiatao Gu | Hany Hassan | Jacob Devlin | Victor O.K. Li
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multi-lingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multi-lingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting.


pdf bib
Learning Translation Models from Monolingual Continuous Representations
Kai Zhao | Hany Hassan | Michael Auli
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


pdf bib
Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
Avneesh Saluja | Hany Hassan | Kristina Toutanova | Chris Quirk
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


pdf bib
Social Text Normalization using Contextual Graph Random Walks
Hany Hassan | Arul Menezes
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


pdf bib
A Syntactified Direct Translation Model with Linear-time Decoding
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Lexicalized Semi-incremental Dependency Parsing
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the International Conference RANLP-2009


pdf bib
Language Independent Text Correction using Finite State Automata
Ahmed Hassan | Sara Noeman | Hany Hassan
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II


pdf bib
Arabic Cross-Document Person Name Normalization
Walid Magdy | Kareem Darwish | Ossama Emam | Hany Hassan
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

pdf bib
BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text
Amgad Madkour | Kareem Darwish | Hany Hassan | Ahmed Hassan | Ossama Emam
Biological, translational, and clinical language processing

pdf bib
Supertagged Phrase-Based Statistical Machine Translation
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics


pdf bib
Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement
Hany Hassan | Ahmed Hassan | Ossama Emam
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Graph Based Semi-Supervised Approach for Information Extraction
Hany Hassan | Ahmed Hassan | Sara Noeman
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing


pdf bib
Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval
Kareem Darwish | Hany Hassan | Ossama Emam
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
An Integrated Approach for Arabic-English Named Entity Translation
Hany Hassan | Jeffrey Sorensen
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages


pdf bib
A Statistical Model for Multilingual Entity Detection and Tracking
R. Florian | H. Hassan | A. Ittycheriah | H. Jing | N. Kambhatla | X. Luo | N. Nicolov | S. Roukos
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004


pdf bib
Language Model Based Arabic Word Segmentation
Young-Suk Lee | Kishore Papineni | Salim Roukos | Ossama Emam | Hany Hassan
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
TIPS: A Translingual Information Processing System
Yaser Al-Onaizan | Radu Florian | Martin Franz | Hany Hassan | Young-Suk Lee | J. Scott McCarley | Kishore Papineni | Salim Roukos | Jeffrey Sorensen | Christoph Tillmann | Todd Ward | Fei Xia
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations