Atsushi Hashimoto


pdf bib
Visual Grounding Annotation of Recipe Flow Graph
Taichi Nishimura | Suzushi Tomori | Hayato Hashimoto | Atsushi Hashimoto | Yoko Yamakata | Jun Harashima | Yoshitaka Ushiku | Shinsuke Mori
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we provide a dataset that gives visual grounding annotations to recipe flow graphs. A recipe flow graph is a representation of the cooking workflow, which is designed with the aim of understanding the workflow from natural language processing. Such a workflow will increase its value when grounded to real-world activities, and visual grounding is a way to do so. Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow. Because the workflows are also linked to the text, this annotation gives visual grounding with workflow’s contextual information between procedural text and visual observation in an indirect manner. We subsidiarily annotated two types of event attributes with each bounding box: “doing-the-action,” or “done-the-action”. As a result of the annotation, we got 2,300 bounding boxes in 272 flow graph recipes. Various experiments showed that the proposed dataset enables us to estimate contextual information described in recipe flow graphs from an image sequence.


pdf bib
Procedural Text Generation from a Photo Sequence
Taichi Nishimura | Atsushi Hashimoto | Shinsuke Mori
Proceedings of the 12th International Conference on Natural Language Generation

Multimedia procedural texts, such as instructions and manuals with pictures, support people to share how-to knowledge. In this paper, we propose a method for generating a procedural text given a photo sequence allowing users to obtain a multimedia procedural text. We propose a single embedding space both for image and text enabling to interconnect them and to select appropriate words to describe a photo. We implemented our method and tested it on cooking instructions, i.e., recipes. Various experimental results showed that our method outperforms standard baselines.


pdf bib
Procedural Text Generation from an Execution Video
Atsushi Ushiku | Hayato Hashimoto | Atsushi Hashimoto | Shinsuke Mori
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In recent years, there has been a surge of interest in automatically describing images or videos in a natural language. These descriptions are useful for image/video search, etc. In this paper, we focus on procedure execution videos, in which a human makes or repairs something and propose a method for generating procedural texts from them. Since video/text pairs available are limited in size, the direct application of end-to-end deep learning is not feasible. Thus we propose to train Faster R-CNN network for object recognition and LSTM for text generation and combine them at run time. We took pairs of recipe and cooking video, generated a recipe from a video, and compared it with the original recipe. The experimental results showed that our method can produce a recipe as accurate as the state-of-the-art scene descriptions.


pdf bib
FlowGraph2Text: Automatic Sentence Skeleton Compilation for Procedural Text Generation
Shinsuke Mori | Hirokuni Maeta | Tetsuro Sasada | Koichiro Yoshino | Atsushi Hashimoto | Takuya Funatomi | Yoko Yamakata
Proceedings of the 8th International Natural Language Generation Conference (INLG)