Attentive Language Models
Giancarlo Salton, Robert Ross, John Kelleher
Abstract
In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs) with an attention mechanism. We show that an “attentive” RNN-LM (with 11M parameters) achieves a better perplexity than larger RNN-LMs (with 66M parameters) and achieves performance comparable to an ensemble of 10 similar sized RNN-LMs. We also show that an “attentive” RNN-LM needs less contextual information to achieve similar results to the state-of-the-art on the wikitext2 dataset.- Anthology ID:
- I17-1045
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 441–450
- Language:
- URL:
- https://www.aclweb.org/anthology/I17-1045
- DOI:
- PDF:
- http://aclanthology.lst.uni-saarland.de/I17-1045.pdf