Kristina Toutanova


2020

pdf bib
Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering
Hao Cheng | Ming-Wei Chang | Kenton Lee | Kristina Toutanova
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We address the problem of extractive question answering using document-level distant super-vision, pairing questions and relevant documents with answer strings. We compare previously used probability space and distant supervision assumptions (assumptions on the correspondence between the weak answer string labels and possible answer mention spans). We show that these assumptions interact, and that different configurations provide complementary benefits. We demonstrate that a multi-objective model can efficiently combine the advantages of multiple assumptions and outperform the best individual formulation. Our approach outperforms previous state-of-the-art models by 4.3 points in F1 on TriviaQA-Wiki and 1.7 points in Rouge-L on NarrativeQA summaries.

2019

pdf bib
Natural Questions: A Benchmark for Question Answering Research
Tom Kwiatkowski | Jennimaria Palomaki | Olivia Redfield | Michael Collins | Ankur Parikh | Chris Alberti | Danielle Epstein | Illia Polosukhin | Jacob Devlin | Kenton Lee | Kristina Toutanova | Llion Jones | Matthew Kelcey | Ming-Wei Chang | Andrew M. Dai | Jakob Uszkoreit | Quoc Le | Slav Petrov
Transactions of the Association for Computational Linguistics, Volume 7

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.

pdf bib
Zero-Shot Entity Linking by Reading Entity Descriptions
Lajanugen Logeswaran | Ming-Wei Chang | Kenton Lee | Kristina Toutanova | Jacob Devlin | Honglak Lee
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data. The goal is to enable robust transfer to highly specialized domains, and so no metadata or alias tables are assumed. In this setting, entities are only identified by text descriptions, and models must rely strictly on language understanding to resolve the new entities. First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities. Second, we propose a simple and effective adaptive pre-training strategy, which we term domain-adaptive pre-training (DAP), to address the domain shift problem associated with linking unseen entities in a new domain. We present experiments on a new dataset that we construct for this task and show that DAP improves over strong pre-training baselines, including BERT. The data and code are available at https://github.com/lajanugen/zeshel.

pdf bib
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee | Ming-Wei Chang | Kristina Toutanova
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

pdf bib
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark | Kenton Lee | Ming-Wei Chang | Tom Kwiatkowski | Michael Collins | Kristina Toutanova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper we study yes/no questions that are naturally occurring — meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging. They often query for complex, non-factoid information, and require difficult entailment-like inference to solve. We also explore the effectiveness of a range of transfer learning baselines. We find that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT. Our best method trains BERT on MultiNLI and then re-trains it on our train set. It achieves 80.4% accuracy compared to 90% accuracy of human annotators (and 62% majority-baseline), leaving a significant gap for future work.

pdf bib
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin | Ming-Wei Chang | Kenton Lee | Kristina Toutanova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5 (7.7 point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

2017

pdf bib
A Nested Attention Neural Hybrid Model for Grammatical Error Correction
Jianshu Ji | Qinlong Wang | Kristina Toutanova | Yongen Gong | Steven Truong | Jianfeng Gao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Grammatical error correction (GEC) systems strive to correct both global errors inword order and usage, and local errors inspelling and inflection. Further developing upon recent work on neural machine translation, we propose a new hybrid neural model with nested attention layers for GEC.Experiments show that the new model can effectively correct errors of both types by incorporating word and character-level information, and that the model significantly outperforms previous neural models for GEC as measured on the standard CoNLL-14 benchmark dataset.Further analysis also shows that the superiority of the proposed model can be largely attributed to the use of the nested attention mechanism, which has proven particularly effective incorrecting local errors that involve small edits in orthography.

pdf bib
NLP for Precision Medicine
Hoifung Poon | Chris Quirk | Kristina Toutanova | Wen-tau Yih
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

We will introduce precision medicine and showcase the vast opportunities for NLP in this burgeoning field with great societal impact. We will review pressing NLP problems, state-of-the art methods, and important applications, as well as datasets, medical resources, and practical issues. The tutorial will provide an accessible overview of biomedicine, and does not presume knowledge in biology or healthcare. The ultimate goal is to reduce the entry barrier for NLP researchers to contribute to this exciting domain.

pdf bib
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Nanyun Peng | Hoifung Poon | Chris Quirk | Kristina Toutanova | Wen-tau Yih
Transactions of the Association for Computational Linguistics, Volume 5

Past work in relation extraction has focused on binary relations in single sentences. Recent NLP inroads in high-value domains have sparked interest in the more general setting of extracting n-ary relations that span multiple sentences. In this paper, we explore a general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction. The graph formulation provides a unified way of exploring different LSTM approaches and incorporating various intra-sentential and inter-sentential dependencies, such as sequential, syntactic, and discourse relations. A robust contextual representation is learned for the entities, which serves as input to the relation classifier. This simplifies handling of relations with arbitrary arity, and enables multi-task learning with related relations. We evaluate this framework in two important precision medicine settings, demonstrating its effectiveness with both conventional supervised learning and distant supervision. Cross-sentence extraction produced larger knowledge bases. and multi-task learning significantly improved extraction accuracy. A thorough analysis of various LSTM approaches yielded useful insight the impact of linguistic analysis on extraction accuracy.

2016

pdf bib
E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses
Yuval Marton | Kristina Toutanova
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present E-TIPSY, a search query corpus annotated with named Entities, Term Importance, POS tags, and SYntactic parses. This corpus contains crowdsourced (gold) annotations of the three most important terms in each query. In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries. This corpus comes in two formats: (1) Sober Subset: annotations that two or more crowd workers agreed upon, and (2) Full Glass: all annotations. We analyze the strikingly low correlation between term importance and syntactic headedness, which invites research into effective ways of combining these different signals. Our corpus can serve as a benchmark for term importance methods aimed at improving search engine quality and as an initial step toward developing a dataset of gold linguistic analysis of web search queries. In addition, it can be used as a basis for linguistic inquiries into the kind of expressions used in search.

pdf bib
A Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs
Kristina Toutanova | Chris Brockett | Ke M. Tran | Saleema Amershi
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text
Kristina Toutanova | Victoria Lin | Wen-tau Yih | Hoifung Poon | Chris Quirk
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Representing Text for Joint Embedding of Text and Knowledge Bases
Kristina Toutanova | Danqi Chen | Patrick Pantel | Hoifung Poon | Pallavi Choudhury | Michael Gamon
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Model Selection for Type-Supervised Learning with Application to POS Tagging
Kristina Toutanova | Waleed Ammar | Pallavi Choudhury | Hoifung Poon
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Observed versus latent features for knowledge base and text inference
Kristina Toutanova | Danqi Chen
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

pdf bib
Grounded Semantic Parsing for Complex Knowledge Extraction
Ankur P. Parikh | Hoifung Poon | Kristina Toutanova
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Detecting Translation Direction: A Cross-Domain Study
Sauleh Eetemadi | Kristina Toutanova
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2014

pdf bib
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kristina Toutanova | Hua Wu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
Avneesh Saluja | Hany Hassan | Kristina Toutanova | Chris Quirk
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Kristina Toutanova | Hua Wu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Asymmetric Features Of Human Generated Translation
Sauleh Eetemadi | Kristina Toutanova
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Regularized Minimum Error Rate Training
Michel Galley | Chris Quirk | Colin Cherry | Kristina Toutanova
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
Kristina Toutanova | Byung-Gyu Ahn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Beyond Left-to-Right: Multiple Decomposition Structures for SMT
Hui Zhang | Kristina Toutanova | Chris Quirk | Jianfeng Gao
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
MSR SPLAT, a language analysis toolkit
Chris Quirk | Pallavi Choudhury | Jianfeng Gao | Hisami Suzuki | Kristina Toutanova | Michael Gamon | Wen-tau Yih | Colin Cherry | Lucy Vanderwende
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
Sungchul Kim | Kristina Toutanova | Hwanjo Yu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Learning Discriminative Projections for Text Similarity Measures
Wen-tau Yih | Kristina Toutanova | John C. Platt | Christopher Meek
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
Jason Naradowsky | Kristina Toutanova
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
Kristina Toutanova | Michel Galley
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
Jason R. Smith | Chris Quirk | Kristina Toutanova
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Translingual Document Representations from Discriminative Projections
John Platt | Kristina Toutanova | Wen-tau Yih
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Joint Optimization for Machine Translation System Combination
Xiaodong He | Kristina Toutanova
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
A global model for joint lemmatization and part-of-speech prediction
Kristina Toutanova | Colin Cherry
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Unsupervised Morphological Segmentation with Log-Linear Models
Hoifung Poon | Colin Cherry | Kristina Toutanova
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Applying Morphology Generation Models to Machine Translation
Kristina Toutanova | Hisami Suzuki | Achim Ruopp
Proceedings of ACL-08: HLT

pdf bib
A Global Joint Model for Semantic Role Labeling
Kristina Toutanova | Aria Haghighi | Christopher D. Manning
Computational Linguistics, Volume 34, Number 2, June 2008 - Special Issue on Semantic Role Labeling

pdf bib
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning
Alexander Clark | Kristina Toutanova
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
Jia Xu | Jianfeng Gao | Kristina Toutanova | Hermann Ney
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Generating Case Markers in Machine Translation
Kristina Toutanova | Hisami Suzuki
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
A Discriminative Syntactic Word Order Model for Machine Translation
Pi-Chuan Chang | Kristina Toutanova
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing
Jianfeng Gao | Galen Andrew | Mark Johnson | Kristina Toutanova
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Generating Complex Morphology for Machine Translation
Einat Minkov | Kristina Toutanova | Hisami Suzuki
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Learning to Predict Case Markers in Japanese
Hisami Suzuki | Kristina Toutanova
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Competitive generative models with structure learning for NLP classification tasks
Kristina Toutanova
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Microsoft Research Treelet Translation System: NAACL 2006 Europarl Evaluation
Arul Menezes | Kristina Toutanova | Chris Quirk
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Automatic Semantic Role Labeling
Scott Wen-tau Yih | Kristina Toutanova
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2005

pdf bib
Joint Learning Improves Semantic Role Labeling
Kristina Toutanova | Aria Haghighi | Christopher Manning
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
A Joint Model for Semantic Role Labeling
Aria Haghighi | Kristina Toutanova | Christopher Manning
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
The Leaf Path Projection View of Parse Trees: Exploring String Kernels for HPSG Parse Selection
Kristina Toutanova | Penka Markova | Christopher Manning
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
Kristina Toutanova | Dan Klein | Christopher D. Manning | Yoram Singer
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

2002

pdf bib
Combining Heterogeneous Classifiers for Word Sense Disambiguation
Dan Klein | Kristina Toutanova | H. Tolga Ilhan | Sepandar D. Kamvar | Christopher D. Manning
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

pdf bib
Extentions to HMM-based Statistical Word Alignment Models
Kristina Toutanova | H. Tolga Ilhan | Christopher Manning
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Feature Selection for a Rich HPSG Grammar Using Decision Trees
Kristina Toutanova | Christopher D. Manning
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
Pronunciation Modeling for Improved Spelling Correction
Kristina Toutanova | Robert Moore
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
The LinGO Redwoods Treebank: Motivation and Preliminary Applications
Stephan Oepen | Kristina Toutanova | Stuart Shieber | Christopher Manning | Dan Flickinger | Thorsten Brants
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

2001

pdf bib
Combining Heterogeneous Classifiers for Word-Sense Disambiguation
H. Tolga Ilhan | Sepandar D. Kamvar | Dan Klein | Christopher D. Manning | Kristina Toutanova
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems