Tempo-Lexical Context Driven Word Embedding for Cross-Session Search Task Extraction

Procheta Sen, Debasis Ganguly, Gareth Jones


Abstract
Task extraction is the process of identifying search intents over a set of queries potentially spanning multiple search sessions. Most existing research on task extraction has focused on identifying tasks within a single session, where the notion of a session is defined by a fixed length time window. By contrast, in this work we seek to identify tasks that span across multiple sessions. To identify tasks, we conduct a global analysis of a query log in its entirety without restricting analysis to individual temporal windows. To capture inherent task semantics, we represent queries as vectors in an abstract space. We learn the embedding of query words in this space by leveraging the temporal and lexical contexts of queries. Embedded query vectors are then clustered into tasks. Experiments demonstrate that task extraction effectiveness is improved significantly with our proposed method of query vector embedding in comparison to existing approaches that make use of documents retrieved from a collection to estimate semantic similarities between queries.
Anthology ID:
N18-1026
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
283–292
Language:
URL:
https://www.aclweb.org/anthology/N18-1026
DOI:
10.18653/v1/N18-1026
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/N18-1026.pdf