David Blei

Also published as: David M. Blei


2020

pdf bib
Text-Based Ideal Points
Keyon Vafa | Suresh Naidu | David Blei
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Ideal point models analyze lawmakers’ votes to quantify their political positions, or ideal points. But votes are not the only way to express a political position. Lawmakers also give speeches, release press statements, and post tweets. In this paper, we introduce the text-based ideal point model (TBIP), an unsupervised probabilistic topic model that analyzes texts to quantify the political positions of its authors. We demonstrate the TBIP with two types of politicized text data: U.S. Senate speeches and senator tweets. Though the model does not analyze their votes or political affiliations, the TBIP separates lawmakers by party, learns interpretable politicized topics, and infers ideal points close to the classical vote-based ideal points. One benefit of analyzing texts, as opposed to votes, is that the TBIP can estimate ideal points of anyone who authors political texts, including non-voting actors. To this end, we use it to study tweets from the 2020 Democratic presidential candidates. Using only the texts of their tweets, it identifies them along an interpretable progressive-to-moderate spectrum.

pdf bib
Topic Modeling in Embedding Spaces
Adji B. Dieng | Francisco J. R. Ruiz | David M. Blei
Transactions of the Association for Computational Linguistics, Volume 8

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the embedded topic model (etm), a generative model of documents that marries traditional topic models with word embeddings. More specifically, the etm models each word with a categorical distribution whose natural parameter is the inner product between the word’s embedding and an embedding of its assigned topic. To fit the etm, we develop an efficient amortized variational inference algorithm. The etm discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation, in terms of both topic quality and predictive performance.

2016

pdf bib
Detecting and Characterizing Events
Allison Chaney | Hanna Wallach | Matthew Connelly | David Blei
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2011

pdf bib
Bayesian Checking for Topic Models
David Mimno | David Blei
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Variational Inference for Adaptor Grammars
Shay B. Cohen | David M. Blei | Noah A. Smith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2007

pdf bib
PU-BCD: Exponential Family Models for the Coarse- and Fine-Grained All-Words Tasks
Jonathan Chang | Miroslav Dudík | David Blei
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
PUTOP: Turning Predominant Senses into a Topic Model for Word Sense Disambiguation
Jordan Boyd-Graber | David Blei
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
A Topic Model for Word Sense Disambiguation
Jordan Boyd-Graber | David Blei | Xiaojin Zhu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)