Masayasu Muraoka


2020

pdf bib
Image Position Prediction in Multimodal Documents
Masayasu Muraoka | Ryosuke Kohita | Etsuko Ishii
Proceedings of the 12th Language Resources and Evaluation Conference

Conventional multimodal tasks, such as caption generation and visual question answering, have allowed machines to understand an image by describing or being asked about it in natural language, often via a sentence. Datasets for these tasks contain a large number of pairs of an image and the corresponding sentence as an instance. However, a real multimodal document such as a news article or Wikipedia page consists of multiple sentences with multiple images. Such documents require an advanced skill of jointly considering the multiple texts and multiple images, beyond a single sentence and image, for the interpretation. Therefore, aiming at building a system that can understand multimodal documents, we propose a task called image position prediction (IPP). In this task, a system learns plausible positions of images in a given document. To study this task, we automatically constructed a dataset of 66K multimodal documents with 320K images from Wikipedia articles. We conducted a preliminary experiment to evaluate the performance of a current multimodal system on our task. The experimental results show that the system outperformed simple baselines while the performance is still far from human performance, which thus poses new challenges in multimodal research.

pdf bib
Visual Objects As Context: Exploiting Visual Objects for Lexical Entailment
Masayasu Muraoka | Tetsuya Nasukawa | Bishwaranjan Bhattacharjee
Findings of the Association for Computational Linguistics: EMNLP 2020

We propose a new word representation method derived from visual objects in associated images to tackle the lexical entailment task. Although it has been shown that the Distributional Informativeness Hypothesis (DIH) holds on text, in which the DIH assumes that a context surrounding a hyponym is more informative than that of a hypernym, it has never been tested on visual objects. Since our perception is tightly associated with language, it is meaningful to explore whether the DIH holds on visual objects. To this end, we consider visual objects as the context of a word and represent a word as a bag of visual objects found in images associated with the word. This allows us to test the feasibility of the visual DIH. To better distinguish word pairs in a hypernym relation from other relations such as co-hypernyms, we also propose a new measurable function that takes into account both the difference in the generality of meaning and similarity of meaning between words. Our experimental results show that the DIH holds on visual objects and that the proposed method combined with the proposed function outperforms existing unsupervised representation methods.

2018

pdf bib
A neural parser as a direct classifier for head-final languages
Hiroshi Kanayama | Masayasu Muraoka | Ryosuke Kohita
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

This paper demonstrates a neural parser implementation suitable for consistently head-final languages such as Japanese. Unlike the transition- and graph-based algorithms in most state-of-the-art parsers, our parser directly selects the head word of a dependent from a limited number of candidates. This method drastically simplifies the model so that we can easily interpret the output of the neural model. Moreover, by exploiting grammatical knowledge to restrict possible modification types, we can control the output of the parser to reduce specific errors without adding annotated corpora. The neural parser performed well both on conventional Japanese corpora and the Japanese version of Universal Dependency corpus, and the advantages of distributed representations were observed in the comparison with the non-neural conventional model.

2017

pdf bib
A Semi-universal Pipelined Approach to the CoNLL 2017 UD Shared Task
Hiroshi Kanayama | Masayasu Muraoka | Katsumasa Yoshikawa
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper presents our system submitted for the CoNLL 2017 Shared Task, “Multilingual Parsing from Raw Text to Universal Dependencies.” We ran the system for all languages with our own fully pipelined components without relying on re-trained baseline systems. To train the dependency parser, we used only the universal part-of-speech tags and distance between words, and applied deterministic rules to assign dependency labels. The simple and delexicalized models are suitable for cross-lingual transfer approaches and a universal language model. Experimental results show that our model performed well in some metrics and leads discussion on topics such as contribution of each component and on syntactic similarities among languages.

2016

pdf bib
Recognizing Open-Vocabulary Relations between Objects in Images
Masayasu Muraoka | Sumit Maharjan | Masaki Saito | Kota Yamaguchi | Naoaki Okazaki | Takayuki Okatani | Kentaro Inui
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2014

pdf bib
Finding The Best Model Among Representative Compositional Models
Masayasu Muraoka | Sonse Shimaoka | Kazeto Yamamoto | Yotaro Watanabe | Naoaki Okazaki | Kentaro Inui
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing