Manling Li


pdf bib
Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li | Alireza Zareian | Qi Zeng | Spencer Whitehead | Di Lu | Heng Ji | Shih-Fu Chang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce a new task, MultiMedia Event Extraction, which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

pdf bib
GAIA: A Fine-grained Multimedia Knowledge Extraction System
Manling Li | Alireza Zareian | Ying Lin | Xiaoman Pan | Spencer Whitehead | Brian Chen | Bo Wu | Heng Ji | Shih-Fu Chang | Clare Voss | Daniel Napierski | Marjorie Freedman
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology. Our system, GAIA, enables seamless search of complex graph queries, and retrieves multimedia evidence including text, images and videos. GAIA achieves top performance at the recent NIST TAC SM-KBP2019 evaluation. The system is publicly available at GitHub and DockerHub, with a narrated video that documents the system.

pdf bib
Connecting the Dots: Event Graph Schema Induction with Path Language Modeling
Manling Li | Qi Zeng | Ying Lin | Kyunghyun Cho | Heng Ji | Jonathan May | Nathanael Chambers | Clare Voss
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Event schemas can guide our understanding and ability to make predictions with respect to what might happen next. We propose a new Event Graph Schema, where two event types are connected through multiple paths involving entities that fill important roles in a coherent story. We then introduce Path Language Model, an auto-regressive language model trained on event-event paths, and select salient and coherent paths to probabilistically construct these graph schemas. We design two evaluation metrics, instance coverage and instance coherence, to evaluate the quality of graph schema induction, by checking when coherent event instances are covered by the schema graph. Intrinsic evaluations show that our approach is highly effective at inducing salient and coherent schemas. Extrinsic evaluations show the induced schema repository provides significant improvement to downstream end-to-end Information Extraction over a state-of-the-art joint neural extraction model, when used as additional global features to unfold instance graphs.


pdf bib
Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization
Manling Li | Lingyu Zhang | Heng Ji | Richard J. Radke
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Transcripts of natural, multi-person meetings differ significantly from documents like news articles, which can make Natural Language Generation models for generating summaries unfocused. We develop an abstractive meeting summarizer from both videos and audios of meeting recordings. Specifically, we propose a multi-modal hierarchical attention across three levels: segment, utterance and word. To narrow down the focus into topically-relevant segments, we jointly model topic segmentation and summarization. In addition to traditional text features, we introduce new multi-modal features derived from visual focus of attention, based on the assumption that the utterance is more important if the speaker receives more attention. Experiments show that our model significantly outperforms the state-of-the-art with both BLEU and ROUGE measures.

pdf bib
Multilingual Entity, Relation, Event and Human Value Extraction
Manling Li | Ying Lin | Joseph Hoover | Spencer Whitehead | Clare Voss | Morteza Dehghani | Heng Ji
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

This paper demonstrates a state-of-the-art end-to-end multilingual (English, Russian, and Ukrainian) knowledge extraction system that can perform entity discovery and linking, relation extraction, event extraction, and coreference. It extracts and aggregates knowledge elements across multiple languages and documents as well as provides visualizations of the results along three dimensions: temporal (as displayed in an event timeline), spatial (as displayed in an event heatmap), and relational (as displayed in entity-relation networks). For our system to further support users’ analyses of causal sequences of events in complex situations, we also integrate a wide range of human moral value measures, independently derived from region-based survey, into the event heatmap. This system is publicly available as a docker container and a live demo.