Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models

Karl Stratos, Michael Collins, Daniel Hsu


Abstract
We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem. These HMMs, which we call anchor HMMs, assume that each tag is associated with at least one word that can have no other tag, which is a relatively benign condition for POS tagging (e.g., “the” is a word that appears only under the determiner tag). We exploit this assumption and extend the non-negative matrix factorization framework of Arora et al. (2013) to design a consistent estimator for anchor HMMs. In experiments, our algorithm is competitive with strong baselines such as the clustering method of Brown et al. (1992) and the log-linear model of Berg-Kirkpatrick et al. (2010). Furthermore, it produces an interpretable model in which hidden states are automatically lexicalized by words.
Anthology ID:
Q16-1018
Volume:
Transactions of the Association for Computational Linguistics, Volume 4
Month:
Year:
2016
Address:
Venue:
TACL
SIG:
Publisher:
Note:
Pages:
245–257
Language:
URL:
https://www.aclweb.org/anthology/Q16-1018
DOI:
10.1162/tacl_a_00096
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/Q16-1018.pdf