Keiji Shinzato


2017

pdf bib
Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models
Yandi Xia | Aaron Levine | Pradipto Das | Giuseppe Di Fabbrizio | Keiji Shinzato | Ankur Datta
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We propose a variant of Convolutional Neural Network (CNN) models, the Attention CNN (ACNN); for large-scale categorization of millions of Japanese items into thirty-five product categories. Compared to a state-of-the-art Gradient Boosted Tree (GBT) classifier, the proposed model reduces training time from three weeks to three days while maintaining more than 96% accuracy. Additionally, our proposed model characterizes products by imputing attentive focus on word tokens in a language agnostic way. The attention words have been observed to be semantically highly correlated with the predicted categories and give us a choice of automatic feature extraction for downstream processing.

2013

pdf bib
Precise Information Retrieval Exploiting Predicate-Argument Structures
Daisuke Kawahara | Keiji Shinzato | Tomohide Shibata | Sadao Kurohashi
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Unsupervised Extraction of Attributes and Their Values from Product Description
Keiji Shinzato | Satoshi Sekine
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2010

pdf bib
Exploiting Term Importance Categories and Dependency Relations for Natural Language Search
Keiji Shinzato | Sadao Kurohashi
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

2008

pdf bib
A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure
Keiji Shinzato | Daisuke Kawahara | Chikara Hashimoto | Sadao Kurohashi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In recent years, language resources acquired from theWeb are released, and these data improve the performance of applications in several NLP tasks. Although the language resources based on the web page unit are useful in NLP tasks and applications such as knowledge acquisition, document retrieval and document summarization, such language resources are not released so far. In this paper, we propose a data format for results of web page processing, and a search engine infrastructure which makes it possible to share approximately 100 million Japanese web data. By obtaining the web data, NLP researchers are enabled to begin their own processing immediately without analyzing web pages by themselves.

pdf bib
TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology
Keiji Shinzato | Tomohide Shibata | Daisuke Kawahara | Chikara Hashimoto | Sadao Kurohashi
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2004

pdf bib
Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents
Keiji Shinzato | Kentaro Torisawa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Acquiring Hyponymy Relations from Web Documents
Keiji Shinzato | Kentaro Torisawa
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004