Kiyotaka Uchimoto


2018

pdf bib
Extending Search System based on Interactive Visualization for Speech Corpora
Tomoko Ohsuga | Yuichi Ishimoto | Tomoko Kajiyama | Shunsuke Kozawa | Kiyotaka Uchimoto | Shuichi Itahashi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
ASPEC: Asian Scientific Paper Excerpt Corpus
Toshiaki Nakazawa | Manabu Yaguchi | Kiyotaka Uchimoto | Masao Utiyama | Eiichiro Sumita | Sadao Kurohashi | Hitoshi Isahara
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

2010

pdf bib
Adapting Chinese Word Segmentation for Machine Translation Based on Short Units
Yiou Wang | Kiyotaka Uchimoto | Jun’ichi Kazama | Canasai Kruengkrai | Kentaro Torisawa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an important first step in machine translation (MT) and its performance impacts MT results. Many factors affect Chinese word segmentations, including the segmentation standards and segmentation strategies. The performance of a corpus-based word segmentation model depends heavily on the quality and the segmentation standard of the training corpora. However, we observed that existing manually annotated Chinese corpora tend to have low segmentation granularity and provide poor morphological information due to the present segmentation standards. In this paper, we introduce a short-unit standard of Chinese word segmentation, which is particularly suitable for machine translation, and propose a semi-automatic method of transforming the existing corpora into the ones that can satisfy our standards. We evaluate the usefulness of our approach on the basis of translation tasks from the technology newswire domain and the scientific paper domain, and demonstrate that it significantly improves the performance of Chinese-Japanese machine translation (over 1.0 BLEU increase).

pdf bib
Collection of Usage Information for Language Resources from Academic Articles
Shunsuke Kozawa | Hitomi Tohyama | Kiyotaka Uchimoto | Shigeki Matsubara
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Recently, language resources (LRs) are becoming indispensable for linguistic researches. However, existing LRs are often not fully utilized because their variety of usage is not well known, indicating that their intrinsic value is not recognized very well either. Regarding this issue, lists of usage information might improve LR searches and lead to their efficient use. In this research, therefore, we collect a list of usage information for each LR from academic articles to promote the efficient utilization of LRs. This paper proposes to construct a text corpus annotated with usage information (UI corpus). In particular, we automatically extract sentences containing LR names from academic articles. Then, the extracted sentences are annotated with usage information by two annotators in a cascaded manner. We show that the UI corpus contributes to efficient LR searches by combining the UI corpus with a metadata database of LRs and comparing the number of LRs retrieved with and without the UI corpus.

2009

pdf bib
Improving Dependency Parsing with Subtrees from Auto-Parsed Data
Wenliang Chen | Jun’ichi Kazama | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Can Chinese Phonemes Improve Machine Transliteration?: A Comparative Study of English-to-Chinese Transliteration Models
Jong-Hoon Oh | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multilingual Dependency Learning: Exploiting Rich Features for Tagging Syntactic and Semantic Dependencies
Hai Zhao | Wenliang Chen | Jun’ichi Kazama | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf bib
Enhancing the Japanese WordNet
Francis Bond | Hitoshi Isahara | Sanae Fujita | Kiyotaka Uchimoto | Takayuki Kuribayashi | Kyoko Kanzaki
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
Machine Transliteration using Target-Language Grapheme and Phoneme: Multi-engine Transliteration Approach
Jong-Hoon Oh | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition
Jong-Hoon Oh | Kiyotaka Uchimoto | Kentaro Torisawa
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging
Canasai Kruengkrai | Kiyotaka Uchimoto | Jun’ichi Kazama | Yiou Wang | Kentaro Torisawa | Hitoshi Isahara
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Boot-Strapping a WordNet Using Multiple Existing WordNets
Francis Bond | Hitoshi Isahara | Kyoko Kanzaki | Kiyotaka Uchimoto
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe the construction of an illustrated Japanese Wordnet. We bootstrap the Wordnet using existing multiple existing wordnets in order to deal with the ambiguity inherent in translation. We illustrate it with pictures from the Open Clip Art Library.

pdf bib
A Method for Automatically Constructing Case Frames for English
Daisuke Kawahara | Kiyotaka Uchimoto
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Case frames are an important knowledge base for a variety of natural language processing (NLP) systems. For the practical use of these systems in the real world, wide-coverage case frames are required. In order to acquire such large-scale case frames, in this paper, we automatically compile case frames from a large corpus. The resultant case frames that are compiled from the English Gigaword corpus contain 9,300 verb entries. The case frames include most examples of normal usage, and are ready to be used in numerous NLP analyzers and applications.

pdf bib
Word-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and its Application
Kiyotaka Uchimoto | Yasuharu Den
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In Japanese, the syntactic structure of a sentence is generally represented by the relationship between phrasal units, bunsetsus in Japanese, based on a dependency grammar. In many cases, the syntactic structure of a bunsetsu is not considered in syntactic structure annotation. This paper gives the criteria and definitions of dependency relationships between words in a bunsetsu and their applications. The target corpus for the word-level dependency annotation is a large spontaneous Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ). One application of word-level dependency relationships is to find basic units for constructing accent phrases.

pdf bib
Development of the Japanese WordNet
Hitoshi Isahara | Francis Bond | Kiyotaka Uchimoto | Masao Utiyama | Kyoko Kanzaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

After a long history of compilation of our own lexical resources, EDR Japanese/English Electronic Dictionary, and discussions with major players on development of various WordNets, Japanese National Institute of Information and Communications Technology started developing the Japanese WordNet in 2006 and will publicly release the first version, which includes both the synset in Japanese and the annotated Japanese corpus of SemCor, in June 2008. As the first step in compiling the Japanese WordNet, we added Japanese equivalents to synsets of the Princeton WordNet. Of course, we must also add some synsets which do not exist in the Princeton WordNet, and must modify synsets in the Princeton WordNet, in order to make the hierarchical structure of Princeton synsets represent thesaurus-like information found in the Japanese language, however, we will address these tasks in a future study. We then translated English sentences which are used in the SemCor annotation into Japanese and annotated them using our Japanese WordNet. This article describes the overview of our project to compile Japanese WordNet and other resources which relate to our Japanese WordNet.

pdf bib
Automatic Acquisition of Usage Information for Language Resources
Shunsuke Kozawa | Hitomi Tohyama | Kiyotaka Uchimoto | Shigeki Matsubara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recently, language resources (LRs) are becoming indispensable for linguistic research. Unfortunately, it is not easy to find their usages by searching the web even though they must be described in the Internet or academic articles. This indicates that the intrinsic value of LRs is not recognized very well. In this research, therefore, we extract a list of usage information for each LR to promote the efficient utilization of LRs. In this paper, we proposed a method for extracting a list of usage information from academic articles by using rules based on syntactic information. The rules are generated by focusing on the syntactic features that are observed in the sentences describing usage information. As a result of experiments, we achieved 72.9% in recall and 78.4% in precision for the closed test and 60.9% in recall and 72.7% in precision for the open test.

pdf bib
Construction of a Metadata Database for Efficient Development and Use of Language Resources
Hitomi Tohyama | Shunsuke Kozawa | Kiyotaka Uchimoto | Shigeki Matsubara | Hitoshi Isahara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The National Institute of Information and Communications Technology (NICT) and Nagoya University have been jointly constructing a large scale database named SHACHI by collecting detailed meta-information on language resources (LRs) in Asia and Western countries, for the purpose of effectively combining LRs. The purpose of this project is to investigate languages, tag sets, and formats compiled in LRs throughout the world, to systematically store LR metadata, to create a search function for this information, and to ultimately utilize all this for a more efficient development of LRs. This metadata database contains more than 2,000 compiled LRs such as corpora, dictionaries, thesauruses and lexicons, forming a large scale metadata of LRs archive. Its metadata, an extended version of OLAC metadata set conforming to Dublin Core, which contain detailed meta-information, have been collected semi-automatically. This paper explains the design and the structure of the metadata database, as well as the realization of the catalogue search tool. Additionally, the website of this database is now open to the public and accessible to all Internet users.

pdf bib
Word Alignment Annotation in a Japanese-Chinese Parallel Corpus
Yujie Zhang | Zhulong Wang | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word & phrase alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine translation model. This paper presents the work of word & phrase alignment annotation in the NICT Japanese-Chinese parallel corpus, which is constructed at the National Institute of Information and Communications Technology (NICT). We describe the specification of word alignment annotation and the tools specially developed for the manual annotation. The manual annotation on 17,000 sentence pairs has been completed. We examined the manually annotated word alignment data and extracted translation knowledge from the word & phrase aligned corpus.

pdf bib
Dependency Parsing with Short Dependency Relations in Unlabeled Data
Wenliang Chen | Daisuke Kawahara | Kiyotaka Uchimoto | Yujie Zhang | Hitoshi Isahara
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Learning Reliability of Parses for Domain Adaptation of Dependency Parsing
Daisuke Kawahara | Kiyotaka Uchimoto
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Construction of an Infrastructure for Providing Users with Suitable Language Resources
Hitomi Tohyama | Shunsuke Kozawa | Kiyotaka Uchimoto | Shigeki Matsubara | Hitoshi Isahara
Coling 2008: Companion volume: Posters

2007

pdf bib
Automatic Evaluation of Machine Translation Based on Rate of Accomplishment of Sub-Goals
Kiyotaka Uchimoto | Katsunori Kotani | Yujie Zhang | Hitoshi Isahara
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Minimally Lexicalized Dependency Parsing
Daisuke Kawahara | Kiyotaka Uchimoto
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
A Hybrid Approach to Word Segmentation and POS Tagging
Tetsuji Nakagawa | Kiyotaka Uchimoto
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib
Detection of Quotations and Inserted Clauses and Its Application to Dependency Structure Analysis in Spontaneous Japanese
Ryoji Hamabe | Kiyotaka Uchimoto | Tatsuya Kawahara | Hitoshi Isahara
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Dependency-structure Annotation to Corpus of Spontaneous Japanese
Kiyotaka Uchimoto | Ryoji Hamabe | Takehiko Maruyama | Katsuya Takanashi | Tatsuya Kawahara | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus inJapanese, based on a dependency grammar. In the same way, thesyntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), isrepresented by dependency relationships between bunsetsus. This paper describes the criteria and definitions of dependency relationships between bunsetsus in the CSJ. The dependency structure of the CSJ is investigated, and the difference in the dependency structures ofwritten text and spontaneous speech is discussed in terms of thedependency accuracies obtained by using a corpus-based model. It is shown that the accuracy of automatic dependency-structure analysis canbe improved if characteristic phenomena of spontaneous speech such as self-corrections, basic utterance units in spontaneous speech, and bunsetsus that have no modifiee are detected and used for dependency-structure analysis.

pdf bib
Automatic Detection and Semi-Automatic Revision of Non-Machine-Translatable Parts of a Sentence
Kiyotaka Uchimoto | Naoko Hayashida | Toru Ishida | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We developed a method for automatically distinguishing the machine-translatable and non-machine-translatable parts of a given sentence for a particular machine translation (MT) system. They can be distinguished by calculating the similarity between a source-language sentence and its back translation for each part of the sentence. The parts with low similarities are highly likely to be non-machine-translatable parts. We showed that the parts of a sentence that are automatically distinguished as non-machine-translatable provide useful information for paraphrasing or revising the sentence in the source language to improve the quality of the translation by the MT system. We also developed a method of providing knowledge useful to effectively paraphrasing or revising the detected non-machine-translatable parts. Two types of knowledge were extracted from the EDR dictionary: one for transforming a lexical entry into an expression used in the definition and the other for conducting the reverse paraphrasing, which transforms an expression found in a definition into the lexical entry. We found that the information provided by the methods helped improve the machine translatability of the originally input sentences.

pdf bib
Chunking Japanese Compound Functional Expressions by Machine Learning
Masatoshi Tsuchiya | Takao Shime | Toshihiro Takagi | Takehito Utsuro | Kiyotaka Uchimoto | Suguru Matsuyoshi | Satoshi Sato | Seiichi Nakagawa
Proceedings of the Workshop on Multi-word-expressions in a multilingual context

2005

pdf bib
Building an Annotated Japanese-Chinese Parallel Corpus - A Part of NICT Multilingual Corpora
Yujie Zhang | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib
Error Annotation for Corpus of Japanese Learner English
Emi Izumi | Kiyotaka Uchimoto | Hitoshi Isahara
Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (LINC-2005)

pdf bib
Analysis of Machine Translation Systems’ Errors in Tense, Aspect, and Modality
Masaki Murata | Kiyotaka Uchimoto | Qing Ma | Toshiyuki Kanamaru | Hitoshi Isahara
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese
Kazuya Shitaoka | Kiyotaka Uchimoto | Tatsuya Kawahara | Hitoshi Isahara
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation Through the Experiment on Automatic Detection of Learners’ Errors
Emi Izumi | Kiyotaka Uchimoto | Hitoshi Isahara
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Multilingual Aligned Parallel Treebank Corpus Reflecting Contextual Information and Its Applications
Kiyotaka Uchimoto | Yujie Zhang | Kiyoshi Sudo | Masaki Murata | Satoshi Sekine | Hitoshi Isahara
Proceedings of the Workshop on Multilingual Linguistic Resources

2003

pdf bib
Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese
Kiyotaka Uchimoto | Chikashi Nobata | Atsushi Yamada | Satoshi Sekine | Hitoshi Isahara
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Automatic Error Detection in the Japanese Learners’ English Spoken Data
Emi Izumi | Kiyotaka Uchimoto | Toyomi Saiga | Thepchai Supnithi | Hitoshi Isahara
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Combining Outputs of Multiple Japanese Named Entity Chunkers by Stacking
Takehito Utsuro | Manabu Sassano | Kiyotaka Uchimoto
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Text Generation from Keywords
Kiyotaka Uchimoto | Satoshi Sekine | Hitoshi Isahara
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Morphological Analysis of the Spontaneous Speech Corpus
Kiyotaka Uchimoto | Chikashi Nobata | Atsushi Yamada | Satoshi Sekine | Hitoshi Isahara
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

2001

pdf bib
Japanese Word Sense Disambiguation using the Simple Bayes and Support Vector Machine Methods
Masaki Murata | Masao Utiyama | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

pdf bib
Word Translation Based on Machine Learning Models Using Translation Memory and Corpora
Kiyotaka Uchimoto | Satoshi Sekine | Masaki Murata | Hitoshi Isahara
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

pdf bib
The Unknown Word Problem: a Morphological Analysis of Japanese Using Maximum Entropy Aided by a Dictionary
Kiyotaka Uchimoto | Satoshi Sekine | Hitoshi Isahara
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using a Support-Vector Machine for Japanese-to-English Translation of Tense, Aspect, and Modality
Masaki Murata | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

2000

pdf bib
Hybrid Neuro and Rule-Based Part of Speech Taggers
Qing Ma | Masaki Murata | Kiyotaka Uchimoto | Hitoshi Isahara
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Bunsetsu Identification Using Category-Exclusive Rules
Masaki Murata | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Backward Beam Search Algorithm for Dependency Analysis of Japanese
Satoshi Sekine | Kiyotaka Uchimoto | Hitoshi Isahara
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf bib
Word Order Acquisition from Corpora
Kiyotaka Uchimoto | Masaki Murata | Qing Ma | Satoshi Sekine | Hitoshi Isahara
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf bib
Dependency Model using Posterior Context
Kiyotaka Uchimoto | Masaki Murata | Satoshi Sekine | Hitoshi Isahara
Proceedings of the Sixth International Workshop on Parsing Technologies

We describe a new model for dependency structure analysis. This model learns the relationship between two phrasal units called bunsetsus as three categories; ‘between’, ‘dependent’, and ‘beyond’, and estimates the dependency likelihood by considering not only the relationship between two bunsetsus but also the relationship between the left bunsetsu and all of the bunsetsus to its right. We implemented this model based on the maximum entropy model. When using the Kyoto University corpus, the dependency accuracy of our model was 88%, which is about 1% higher than that of the conventional model using exactly the same features.

pdf bib
Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules
Kiyotaka Uchimoto | Qing Ma | Masaki Murata | Hiromi Ozaku | Hitoshi Isahara
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
Japanese Dependency Structure Analysis Based on Maximum Entropy Models
Kiyotaka Uchimoto | Satoshi Sekine | Hitoshi Isahara
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib
Intelligent Network News Reader with Visual User Interface
Hitoshi Isahara | Kiyotaka Uchimoto | Hiromi Ozaku
Content Visualization and Intermedia Representations (CVIR’98)

1994

pdf bib
Thesaurus-based Efficient Example Retrieval by Generating Retrieval Queries from Similarities
Takehito Utsuro | Kiyotaka Uchimoto | Mitsutaka Matsumoto | Makoto Nagao
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics