Distilling weighted finite automata from arbitrary probabilistic models

Ananda Theertha Suresh, Brian Roark, Michael Riley, Vlad Schogol


Abstract
Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models.
Anthology ID:
W19-3112
Volume:
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
Month:
September
Year:
2019
Address:
Dresden, Germany
Venues:
FSMNLP | WS
SIG:
SIGFSM
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–97
Language:
URL:
https://www.aclweb.org/anthology/W19-3112
DOI:
10.18653/v1/W19-3112
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-3112.pdf