AbstractEnsemble methods, which combine multiple models at decoding time, are now widely known to be effective for text-generation tasks. However, they generally increase computational costs, and thus, there have been many studies on compressing or distilling ensemble models. In this paper, we propose an alternative, simple but effective unsupervised ensemble method, post-ensemble, that combines multiple models by selecting a majority-like output in post-processing. We theoretically prove that our method is closely related to kernel density estimation based on the von Mises-Fisher kernel. Experimental results on a news-headline-generation task show that the proposed method performs better than the current ensemble methods.