Huang Hu


pdf bib
Neural Response Generation with Meta-words
Can Xu | Wei Wu | Chongyang Tao | Huang Hu | Matt Schuerman | Ying Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present open domain dialogue generation with meta-words. A meta-word is a structured record that describes attributes of a response, and thus allows us to explicitly model the one-to-many relationship within open domain dialogues and perform response generation in an explainable and controllable manner. To incorporate meta-words into generation, we propose a novel goal-tracking memory network that formalizes meta-word expression as a goal in response generation and manages the generation process to achieve the goal with a state memory panel and a state controller. Experimental results from both automatic evaluation and human judgment on two large-scale data sets indicate that our model can significantly outperform state-of-the-art generation models in terms of response relevance, response diversity, and accuracy of meta-word expression.


pdf bib
Playing 20 Question Game with Policy-Based Reinforcement Learning
Huang Hu | Xianchao Wu | Bingfeng Luo | Chongyang Tao | Can Xu | Wei Wu | Zhan Chen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The 20 Questions (Q20) game is a well known game which encourages deductive reasoning and creativity. In the game, the answerer first thinks of an object such as a famous person or a kind of animal. Then the questioner tries to guess the object by asking 20 questions. In a Q20 game system, the user is considered as the answerer while the system itself acts as the questioner which requires a good strategy of question selection to figure out the correct object and win the game. However, the optimal policy of question selection is hard to be derived due to the complexity and volatility of the game environment. In this paper, we propose a novel policy-based Reinforcement Learning (RL) method, which enables the questioner agent to learn the optimal policy of question selection through continuous interactions with users. To facilitate training, we also propose to use a reward network to estimate the more informative reward. Compared to previous methods, our RL method is robust to noisy answers and does not rely on the Knowledge Base of objects. Experimental results show that our RL method clearly outperforms an entropy-based engineering system and has competitive performance in a noisy-free simulation environment.