Marjorie Freedman


2020

pdf bib
GAIA: A Fine-grained Multimedia Knowledge Extraction System
Manling Li | Alireza Zareian | Ying Lin | Xiaoman Pan | Spencer Whitehead | Brian Chen | Bo Wu | Heng Ji | Shih-Fu Chang | Clare Voss | Daniel Napierski | Marjorie Freedman
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology. Our system, GAIA, enables seamless search of complex graph queries, and retrieves multimedia evidence including text, images and videos. GAIA achieves top performance at the recent NIST TAC SM-KBP2019 evaluation. The system is publicly available at GitHub and DockerHub, with a narrated video that documents the system.

pdf bib
SEARCHER: Shared Embedding Architecture for Effective Retrieval
Joel Barry | Elizabeth Boschee | Marjorie Freedman | Scott Miller
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

We describe an approach to cross lingual information retrieval that does not rely on explicit translation of either document or query terms. Instead, both queries and documents are mapped into a shared embedding space where retrieval is performed. We discuss potential advantages of the approach in handling polysemy and synonymy. We present a method for training the model, and give details of the model implementation. We present experimental results for two cases: Somali-English and Bulgarian-English CLIR.

2019

pdf bib
Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources
Meryem M’hamdi | Marjorie Freedman | Jonathan May
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Event trigger extraction is an information extraction task of practical utility, yet it is challenging due to the difficulty of disambiguating word sense meaning. Previous approaches rely extensively on hand-crafted language-specific features and are applied mainly to English for which annotated datasets and Natural Language Processing (NLP) tools are available. However, the availability of such resources varies from one language to another. Recently, contextualized Bidirectional Encoder Representations from Transformers (BERT) models have established state-of-the-art performance for a variety of NLP tasks. However, there has not been much effort in exploring language transfer using BERT for event extraction. In this work, we treat event trigger extraction as a sequence tagging problem and propose a cross-lingual framework for training it without any hand-crafted features. We experiment with different flavors of transfer learning from high-resourced to low-resourced languages and compare the performance of different multilingual embeddings for event trigger extraction. Our results show that training in a multilingual setting outperforms language-specific models for both English and Chinese. Our work is the first to experiment with two event architecture variants in a cross-lingual setting, to show the effectiveness of contextualized embeddings obtained using BERT, and to explore and analyze its performance on Arabic.

pdf bib
SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage
Elizabeth Boschee | Joel Barry | Jayadev Billa | Marjorie Freedman | Thamme Gowda | Constantine Lignos | Chester Palen-Michel | Michael Pust | Banriskhem Kayang Khonglah | Srikanth Madikeri | Jonathan May | Scott Miller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

With the increasing democratization of electronic media, vast information resources are available in less-frequently-taught languages such as Swahili or Somali. That information, which may be crucially important and not available elsewhere, can be difficult for monolingual English speakers to effectively access. In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed. The SARAL system achieved the top end-to-end performance in the most recent IARPA MATERIAL CLIR+summarization evaluations. Our demonstration system provides end-to-end open query retrieval and summarization capability, and presents the original source text or audio, speech transcription, and machine translation, for two low resource languages.

2018

pdf bib
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation
Bonan Min | Marjorie Freedman | Roger Bock | Ralph Weischedel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Probabilistic Inference for Cold Start Knowledge Base Population with Prior World Knowledge
Bonan Min | Marjorie Freedman | Talya Meltzer
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Building knowledge bases (KB) automatically from text corpora is crucial for many applications such as question answering and web search. The problem is very challenging and has been divided into sub-problems such as mention and named entity recognition, entity linking and relation extraction. However, combining these components has shown to be under-constrained and often produces KBs with supersize entities and common-sense errors in relations (a person has multiple birthdates). The errors are difficult to resolve solely with IE tools but become obvious with world knowledge at the corpus level. By analyzing Freebase and a large text collection, we found that per-relation cardinality and the popularity of entities follow the power-law distribution favoring flat long tails with low-frequency instances. We present a probabilistic joint inference algorithm to incorporate this world knowledge during KB construction. Our approach yields state-of-the-art performance on the TAC Cold Start task, and 42% and 19.4% relative improvements in F1 over our baseline on Cold Start hop-1 and all-hop queries respectively.

pdf bib
Learning Transferable Representation for Bilingual Relation Extraction via Convolutional Neural Networks
Bonan Min | Zhuolin Jiang | Marjorie Freedman | Ralph Weischedel
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Typically, relation extraction models are trained to extract instances of a relation ontology using only training data from a single language. However, the concepts represented by the relation ontology (e.g. ResidesIn, EmployeeOf) are language independent. The numbers of annotated examples available for a given ontology vary between languages. For example, there are far fewer annotated examples in Spanish and Japanese than English and Chinese. Furthermore, using only language-specific training data results in the need to manually annotate equivalently large amounts of training for each new language a system encounters. We propose a deep neural network to learn transferable, discriminative bilingual representation. Experiments on the ACE 2005 multilingual training corpus demonstrate that the joint training process results in significant improvement in relation classification performance over the monolingual counterparts. The learnt representation is discriminative and transferable between languages. When using 10% (25K English words, or 30K Chinese characters) of the training data, our approach results in doubling F1 compared to a monolingual baseline. We achieve comparable performance to the monolingual system trained with 250K English words (or 300K Chinese characters) With 50% of training data.

2016

pdf bib
A Comparison of Event Representations in DEFT
Ann Bies | Zhiyi Song | Jeremy Getman | Joe Ellis | Justin Mott | Stephanie Strassel | Martha Palmer | Teruko Mitamura | Marjorie Freedman | Heng Ji | Tim O’Gorman
Proceedings of the Fourth Workshop on Events

2011

pdf bib
Extreme Extraction – Machine Reading in a Week
Marjorie Freedman | Lance Ramshaw | Elizabeth Boschee | Ryan Gabbard | Gary Kratkiewicz | Nicolas Ward | Ralph Weischedel
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Ryan Gabbard | Marjorie Freedman | Ralph Weischedel
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Language Use: What can it tell us?
Marjorie Freedman | Alex Baron | Vasin Punyakanok | Ralph Weischedel
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Empirical Studies in Learning to Read
Marjorie Freedman | Edward Loper | Elizabeth Boschee | Ralph Weischedel
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

2008

pdf bib
Who is Who and What is What: Experiments in Cross-Document Co-Reference
Alex Baron | Marjorie Freedman
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing