Alexander G. Hauptmann

Also published as: Alex Hauptmann, Alexander Hauptmann


2020

pdf bib
Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting
Po-Yao Huang | Junjie Hu | Xiaojun Chang | Alexander Hauptmann
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only. However, it is still challenging to associate source-target sentences in the latent space. As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising yet under-explored in unsupervised multimodal MT (MMT). In this paper, we investigate how to utilize visual content for disambiguation and promoting latent space alignment in unsupervised MMT. Our model employs multimodal back-translation and features pseudo visual pivoting in which we learn a shared multilingual visual-semantic embedding space and incorporate visually-pivoted captioning as additional weak supervision. The experimental results on the widely used Multi30K dataset show that the proposed model significantly improves over the state-of-the-art methods and generalizes well when images are not available at the testing time.

pdf bib
Event-Related Bias Removal for Real-time Disaster Events
Salvador Medina Maza | Evangelia Spiliopoulou | Eduard Hovy | Alexander Hauptmann
Findings of the Association for Computational Linguistics: EMNLP 2020

Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks. Detecting actionable posts that contain useful information requires rapid analysis of huge volumes of data in real-time. This poses a complex problem due to the large amount of posts that do not contain any actionable information. Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis. Prior work focuses on models pre-trained on similar event types. However, those models capture unnecessary event-specific biases, like the location of the event, which affect the generalizability and performance of the classifiers on new unseen data from an emerging new event. In our work, we train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.

2019

pdf bib
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao Huang | Xiaojun Chang | Alexander Hauptmann
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations. Specifically, our model attends to different types of textual semantics in two languages and visual objects for fine-grained alignments between sentences and images. We introduce a new objective function which explicitly encourages attention diversity to learn an improved visual-semantic embedding space. We evaluate our model in the German-Image and English-Image matching tasks on the Multi30K dataset, and in the Semantic Textual Similarity task with the English descriptions of visual content. Results show that our model yields a significant performance gain over other methods in all of the three tasks.

pdf bib
ExCL: Extractive Clip Localization Using Natural Language Descriptions
Soham Ghosh | Anuva Agarwal | Zarana Parekh | Alexander Hauptmann
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The task of retrieving clips within videos based on a given natural language query requires cross-modal reasoning over multiple frames. Prior approaches such as sliding window classifiers are inefficient, while text-clip similarity driven ranking-based approaches such as segment proposal networks are far more complicated. In order to select the most relevant video clip corresponding to the given text description, we propose a novel extractive approach that predicts the start and end frames by leveraging cross-modal interactions between the text and video - this removes the need to retrieve and re-rank multiple proposal segments. Using recurrent networks we encode the two modalities into a joint representation which is then used in different variants of start-end frame predictor networks. Through extensive experimentation and ablative analysis, we demonstrate that our simple and elegant approach significantly outperforms state of the art on two datasets and has comparable performance on a third.

2008

pdf bib
Vox Populi Annotation: Measuring Intensity of Ideological Perspectives by Aggregating Group Judgments
Wei-Hao Lin | Alexander Hauptmann
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Polarizing discussions about political and social issues are common in mass media. Annotations on the degree to which a sentence expresses an ideological perspective can be valuable for evaluating computer programs that can automatically identify strongly biased sentences, but such annotations remain scarce. We annotated the intensity of ideological perspectives expressed in 250 sentences by aggregating judgments from 18 annotators. We proposed methods of determining the number of annotators and assessing reliability, and showed the annotations were highly consistent across different annotator groups.

2006

pdf bib
Are These Documents Written from Different Perspectives? A Test of Different Perspectives Based on Statistical Distribution Divergence
Wei-Hao Lin | Alexander Hauptmann
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Which Side are You on? Identifying Perspectives at the Document and Sentence Levels
Wei-Hao Lin | Theresa Wilson | Janyce Wiebe | Alexander Hauptmann
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

2002

pdf bib
A New Probabilistic Model for Title Generation
Rong Jin | Alexander G. Hauptmann
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Automatic Title Generation for Spoken Broadcast News
Rong Jin | Alexander G. Hauptmann
Proceedings of the First International Conference on Human Language Technology Research

1994

pdf bib
A Prototype Reading Coach that Listens: Summary of Project LISTEN
Alex Hauptmann | Jack Mostow | Steven F. Roth | Matthew Kane | Adam Swift
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1990

pdf bib
A Comparison of Speech and Typed Input
Alexander G. Hauptmann | Alexander I. Rudnicky
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

1986

pdf bib
Parsing Spoken Language: a Semantic Caseframe Approach
Philip J. Hayes | Alexander G. Hauptmann | Jaime G. Carbonell | Masaru Tomita
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics