Madian Khabsa


pdf bib
To Pretrain or Not to Pretrain: Examining the Benefits of Pretrainng on Resource Rich Tasks
Sinong Wang | Madian Khabsa | Hao Ma
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training samples used in the downstream task. On several text classification tasks, we show that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1%. Our findings indicate that MLM-based models might reach a diminishing return point as the supervised data size increases significantly.

pdf bib
Language Models as Fact Checkers?
Nayeon Lee | Belinda Li | Sinong Wang | Wen-tau Yih | Hao Ma | Madian Khabsa
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)

Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data. In this paper, we leverage this implicit knowledge to create an effective end-to-end fact checker using a solely a language model, without any external knowledge or explicit retrieval components. While previous work on extracting knowledge from LMs have focused on the task of open-domain question answering, to the best of our knowledge, this is the first work to examine the use of language models as fact checkers. In a closed-book setting, we show that our zero-shot LM approach outperforms a random baseline on the standard FEVER task, and that our finetuned LM compares favorably with standard baselines. Though we do not ultimately outperform methods which use explicit knowledge bases, we believe our exploration shows that this method is viable and has much room for exploration.


pdf bib
Keeping Notes: Conditional Natural Language Generation with a Scratchpad Encoder
Ryan Benmalek | Madian Khabsa | Suma Desu | Claire Cardie | Michele Banko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks. By enabling the decoder at each time step to write to all of the encoder output layers, Scratchpad can employ the encoder as a “scratchpad” memory to keep track of what has been generated so far and thereby guide future generation. We evaluate Scratchpad in the context of three well-studied natural language generation tasks — Machine Translation, Question Generation, and Text Summarization — and obtain state-of-the-art or comparable performance on standard datasets for each task. Qualitative assessments in the form of human judgements (question generation), attention visualization (MT), and sample output (summarization) provide further evidence of the ability of Scratchpad to generate fluent and expressive output.