Halid Ziya Yerebakan


pdf bib
Document Representation Learning for Patient History Visualization
Halid Ziya Yerebakan | Yoshihisa Shinagawa | Parmeet Bhatia | Yiqiang Zhan
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We tackle the problem of generating a diagrammatic summary of a set of documents each of which pertains to loosely related topics. In particular, we aim at visualizing the medical histories of patients. In medicine, choosing relevant reports from a patient’s past exams for comparison provide valuable information for precise treatment planning. Manually finding the relevant reports for comparison studies from a large database is time-consuming, which could result overlooking of some critical information. This task can be automated by defining similarity among documents which is a nontrivial task since these documents are often stored in an unstructured text format. To facilitate this, we have used a representation learning algorithm that creates a semantic representation space for documents where the clinically related documents lie close to each other. We have utilized referral information to weakly supervise a LSTM network to learn this semantic space. The abstract representations within this semantic space are not only useful to visualize disease progressions corresponding to the relevant report groups of a patient, but are also beneficial to analyze diseases at the population level. The proposed key tool here is clustering of documents based on the document similarity whose metric is learned from corpora.