Michael Glass

Also published as: Michael R. Glass


pdf bib
Span Selection Pre-training for Question Answering
Michael Glass | Alfio Gliozzo | Rishav Chakravarti | Anthony Ferritto | Lin Pan | G P Shrivatsa Bhargav | Dinesh Garg | Avi Sil
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection PreTraining (SSPT) poses cloze-like training instances, but rather than draw the answer from the model’s parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.


pdf bib
CFO: A Framework for Building Production NLP Systems
Rishav Chakravarti | Cezar Pendus | Andrzej Sakrajda | Anthony Ferritto | Lin Pan | Michael Glass | Vittorio Castelli | J. William Murdock | Radu Florian | Salim Roukos | Avi Sil
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

This paper introduces a novel orchestration framework, called CFO (Computation Flow Orchestrator), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Com- prehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems. Screencast links: - Short video (< 3 min): http: //ibm.biz/gaama_demo - Supplementary long video (< 13 min): http://ibm.biz/gaama_cfo_demo

pdf bib
Learning Relational Representations by Analogy using Hierarchical Siamese Networks
Gaetano Rossiello | Alfio Gliozzo | Robert Farrell | Nicolas Fauceglia | Michael Glass
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We address relation extraction as an analogy problem by proposing a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. Following this idea, we collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. We leverage this dataset to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. We evaluate our model in a one-shot learning task by showing a promising generalization capability in order to classify unseen relation types, which makes this approach suitable to perform automatic knowledge base population with minimal supervision. Moreover, the model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a downstream relation extraction task.


pdf bib
Discovering Implicit Knowledge with Unary Relations
Michael Glass | Alfio Gliozzo
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

State-of-the-art relation extraction approaches are only able to recognize relationships between mentions of entity arguments stated explicitly in the text and typically localized to the same sentence. However, the vast majority of relations are either implicit or not sententially localized. This is a major problem for Knowledge Base Population, severely limiting recall. In this paper we propose a new methodology to identify relations between two entities, consisting of detecting a very large number of unary relations, and using them to infer missing entities. We describe a deep learning architecture able to learn thousands of such relations very efficiently by using a common deep learning based representation. Our approach largely outperforms state of the art relation extraction technology on a newly introduced web scale knowledge base population benchmark, that we release to the research community.


pdf bib
Lexical Substitution for the Medical Domain
Martin Riedl | Michael Glass | Alfio Gliozzo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Word Semantic Representations using Bayesian Probabilistic Tensor Factorization
Jingwei Zhang | Jeremy Salwen | Michael Glass | Alfio Gliozzo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)


pdf bib
JoBimText Visualizer: A Graph-based Approach to Contextualizing Distributional Similarity
Chris Biemann | Bonaventura Coppola | Michael R. Glass | Alfio Gliozzo | Matthew Hatem | Martin Riedl
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing


pdf bib
Structured Term Recognition in Medical Text
Michael Glass | Alfio Gliozzo
Proceedings of COLING 2012


pdf bib
Aggregation Improves Learning: Experiments in Natural Language Generation for Intelligent Tutoring Systems
Barbara Di Eugenio | Davide Fossati | Dan Yu | Susan Haller | Michael Glass
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)


pdf bib
Squibs and Discussions: The Kappa Statistic: A Second Look
Barbara Di Eugenio | Michael Glass
Computational Linguistics, Volume 30, Number 1, March 2004


pdf bib
Latent Semantic Analysis for Dialogue Act Classification
Riccardo Serafin | Barbara Di Eugenio | Michael Glass
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers


pdf bib
MUP - The UIC Standoff Markup Tool
Michael Glass | Barbara Di Eugenio
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

pdf bib
The DIAG experiments: Natural Language Generation for Intelligent Tutoring Systems
Barbara Di Eugenio | Michael Glass | Michael Trolio
Proceedings of the International Natural Language Generation Conference

pdf bib
The binomial cumulative distribution function, or, is my system better than yours?
Barbara Di Eugenio | Michael Glass | Michael J. Scott
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


pdf bib
System Demonstration Content Planning as the Basis for an Intelligent Tutoring System
Reva Freedman | Stefan Brandle | Michael Glass | Jung Hee Kim | Yujian Zhou | Martha W. Evens
Natural Language Generation