Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings

Hrishikesh Ganu, Viswa Datha P.


Abstract
We present early results from a system under development which uses sub-word embeddings for query expansion in presence of mis-spelled words and other aberrations. We work for a company which creates accounting software and the end goal is to improve customer experience when they search for help on our “Customer Care” portal. Our customers use colloquial language, non-standard acronyms and sometimes mis-spell words when they use our Search portal or interact over other channels. However, our Knowledge Base has curated content which leverages technical terms and is in language which is quite formal. This results in the answer not being retrieved even though the answer might actually be present in the documentation (as assessed by a human). We address this problem by creating equivalence classes of words with similar meanings (with the additional property that the mappings to these equivalence classes are robust to mis-spellings) using sub-word embeddings and then use them to fine tune an Elasticsearch index to improve recall. We demonstrate through an end-end system that using sub-word embeddings leads to a significant lift in correct answers retrieved for an accounting corpus available in the public domain.
Anthology ID:
W18-1208
Volume:
Proceedings of the Second Workshop on Subword/Character LEvel Models
Month:
June
Year:
2018
Address:
New Orleans
Venues:
NAACL | SCLeM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
61–65
Language:
URL:
https://www.aclweb.org/anthology/W18-1208
DOI:
10.18653/v1/W18-1208
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-1208.pdf