A Practice of Tourism Knowledge Graph Construction based on Heterogeneous Information

Dinghe Xiao, Nannan Wang, Jiangang Yu, Chunhong Zhang, Jiaqi Wu


Abstract
The increasing amount of semi-structured and unstructured data on tourism websites brings a need for information extraction (IE) so as to construct a Tourism-domain Knowledge Graph (TKG), which is helpful to manage tourism information and develop downstream applications such as tourism search engine, recommendation and Q & A. However, the existing TKG is deficient, and there are few open methods to promote the construction and widespread application of TKG. In this paper, we present a systematic framework to build a TKG for Hainan, collecting data from popular tourism websites and structuring it into triples. The data is multi-source and heterogeneous, which raises a great challenge for processing it. So we develop two pipelines of processing methods for semi-structured data and unstructured data respectively. We refer to tourism InfoBox for semi-structured knowledge extraction and leverage deep learning algorithms to extract entities and relations from unstructured travel notes, which are colloquial and high-noise, and then we fuse the extracted knowledge from two sources. Finally, a TKG with 13 entity types and 46 relation types is established, which totally contains 34,079 entities and 441,371 triples. The systematic procedure proposed by this paper can construct a TKG from tourism websites, which can further applied to many scenarios and provide detailed reference for the construction of other domain-specific knowledge graphs.
Anthology ID:
2020.ccl-1.87
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
939–949
Language:
English
URL:
https://www.aclweb.org/anthology/2020.ccl-1.87
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ccl-1.87.pdf