Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics

Martin Boyanov, Preslav Nakov, Alessandro Moschitti, Giovanni Da San Martino, Ivan Koychev


Abstract
We propose to use question answering (QA) data from Web forums to train chat-bots from scratch, i.e., without dialog data. First, we extract pairs of question and answer sentences from the typically much longer texts of questions and answers in a forum. We then use these shorter texts to train seq2seq models in a more efficient way. We further improve the parameter optimization using a new model selection strategy based on QA measures. Finally, we propose to use extrinsic evaluation with respect to a QA task as an automatic evaluation method for chatbot systems. The evaluation shows that the model achieves a MAP of 63.5% on the extrinsic task. Moreover, our manual evaluation demonstrates that the model can answer correctly 49.5% of the questions when they are similar in style to how questions are asked in the forum, and 47.3% of the questions, when they are more conversational in style.
Anthology ID:
R17-1018
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
121–129
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_018
DOI:
10.26615/978-954-452-049-6_018
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://doi.org/10.26615/978-954-452-049-6_018