Dialogue response generation models that use template ranking rather than direct sequence generation allow model developers to limit generated responses to pre-approved messages. However, manually creating templates is time-consuming and requires domain expertise. To alleviate this problem, we explore automating the process of creating dialogue templates by using unsupervised methods to cluster historical utterances and selecting representative utterances from each cluster. Specifically, we propose an end-to-end model called Deep Sentence Encoder Clustering (DSEC) that uses an auto-encoder structure to jointly learn the utterance representation and construct template clusters. We compare this method to a random baseline that randomly assigns templates to clusters as well as a strong baseline that performs the sentence encoding and the utterance clustering sequentially. To evaluate the performance of the proposed method, we perform an automatic evaluation with two annotated customer service datasets to assess clustering effectiveness, and a human-in-the-loop experiment using a live customer service application to measure the acceptance rate of the generated templates. DSEC performs best in the automatic evaluation, beats both the sequential and random baselines on most metrics in the human-in-the-loop experiment, and shows promising results when compared to gold/manually created templates.